





SYCHOLOGICAL i MEASUREMENT 


VOLUME FIVE, NUMBER FOUR, WINTER 


Evidence on the Validity of the Armed Forces Institute Tests 
of General Educational Development (College Level). 
Henry S. Dyer 3 


A Study of the Factor Structure of Thirteen Personality 
Variables. CONSTANCE LOVELL 


The Office of Radio Research: A Division of the Bureau of 
Applied Social Research, Columbia University. MARJorigE 
Fiske and Paut F. LAZzARSFELD 351 


The Duties of Civil Service Examiners and Test Technicians 371 


The Arrangement of Choices in Multiple Choice Questions and 
a Scheme for Randomizing Choices. Cuaries I. Mosier 
and HEten G. Price 


The Rationale of Temperament Testing. 


Mechanical Ability, Its Nature and Measurement. 
Manual Dexterity. J. R. WiTTENBORN 


Speed and Level Components in Time-Limit Scores: A Factor 
Analysis. Witt1aM M. Davinson and Joun B. Carroit 411 


The Use of an Objective Test in Predicting Rhetoric Grades. 
Irwin A. Berc, GRAHAM JoHNSON, and Rosert P. Larsen 429 


A Quick Graphic Method for Product Moment ‘r” WiILLiAM 
Leroy JENKINS 


Measurement News 
The Contributors 
Measurement Abstracts 


The Contents of This Issue Are Listed in the Education Index 














STATEMENT OF THE OWNERSHIP, ee ge oop er ey ETC., 
REQUIRED BY THE ACTS OF CONGRESS OF AUGUST 2 1912, D MARCH 
3, 1933, of EDUCATIONAL AND ha hig coat mg MEASUR EMENT. published 
quarterly at Lancaster, Pennsylvania, for October 1, 1945. CITY OF WASHINGTON, 
DISTRICT OF COLUMBIA. 

Before me, a Notary Public in and for the poe and county egg poneeenay 
appeared G. Frederic Kuder, who, having been duly ore according to law, depose 
and says that he is the Editor of the EDUCATION AND P YCHOLOGICAL 
MEASUREMENT, and that the following is, to the nay “ his knowledge and belief, 
a true statement ‘of the 9 econ management (and if a daily paper, the circula- 
tion), ete., of the aforesaid publication for the date shown in the above caption, 
required by the Act of August 24, 1912, as amended by the Act of March 8, 1933, 
nage me section 537, Postal Laws and Regulations, printed on the reverse of this 
orm, to wit: 

1. That the namés and addresses of the publisher, editor, managing editor, and 
business managers are: Publisher, G. Frederic Kuder, 917 Fifteenth St., N.W., Wash- 
ington, D.C. Editor, G. Frederic Kuder, 917 Fifteenth St., N.W., Washington, Dt. 
Managing Editor, G. Frederic Kuder, 917 Fifteenth St., N.W., Washington, D. C. 
Business Manager, G. Frederic Kuder. 917 Fifteenth St., N.W., Washington. D. C. 

2. That the owner is: (If owned by a corporation, its name and address must be 
stated and also immediately thereunder the names and addresses of stockholders own- 
ing or holding one per cent or more of total amount of stock. If not owned by a 
corporation, the names and addresses of the individual owners must be given. If 
owned by a firm, company, or other unincorporated concern, its name and address, as 
well as those of each individual member, must be given.) G. Frederic Kuder, 917 
Fifteenth Street, N.W., Washington, D. C. 

3. That the known bondholders, mortgagees, and other security holders owning 
or holding 1 per cent or more of total amount of bonds, mortgages, or other securities 
are: (If there are none, so state.) None. 

hat the two paragraphs next above, giving the names of the owners, stock- 
holders, and security holders, if any, contain not only the list of stockholders and 
security holders as they appear spe the books of the company but also, in cases 
where the stockholder or security holder appears upon the books of the company as 
trustee or in anv other fiduciary relation, the name of the person or corporation for 
whom such trustee is acting, is given; also that the said two paragraphs contain 
statements embracing affiant’s fnll knowledge and belief as to the circumstances and 
conditions under which stockholders and security holders who do not appear upon the 
books of the company as trnstees, hold stock and securities in a capacity other than 
that of a bona fide owner: and this affiant has no reason to believe that any other 
person, association, or corporation has any interest direct or indirect in the said 
stock, bonds. or other securities than as so stated by him. 

. That the average number of copies of each issue of this publication sold or 
distributed. through the mails or otherwise, to paid subscribers during the twelve 
months preceding the date shown above is: not a daily. (This information is required 
from daily publications only.) 

Signed: G. Frederic Kuder, Editor. Sworn to and subscribed before me this 
28th day of September, 1945.—George L. Haines. 

Seal) (My commission expires January 81, 1949.) 


PRINTED IN THE UNITED STATES OF AMERICA 
THE SCIENCE PRESS PRINTING COMPANY 
LANCASTER, PENNSYLVANIA 




















EVIDENCE ON THE VALIDITY OF THE ARMED 
FORCES INSTITUTE TESTS OF GENERAL 
EDUCATIONAL DEVELOPMENT 
(COLLEGE LEVEL) 


HENRY S. DYER 


Harvard University 


Purpose of the Study 


THE nature and purposes of the United States Armed Forces 
Institute Tests of General Educational Development are fully 
described in the Examiner’s Manual provided with the tests.' 
The battery consists of four tests as follows: 


Test 1. Correctness and Effectiveness of Expression. 

Test 2. Interpretation of Reading Materials in the Social 
Studies. 

Test 3. Interpretation of Reading Materials in the Natu- 
ral Sciences. 

Test 4. Interpretation of Literary Materials. 


All four of the tests are objective in form, the questions being 

wholly of the multiple-choice type. There are two equivalent 

forms of each test, one of which is to be administered exclu- 

sively by the Armed Forces Institute (the Military Form) and 

the other of which is available to colleges generally through the 

American Council on Education (the Civilian Form). 
According to the Examiner's Manual: 


“The college level tests are intended for use primarily to 
determine whether or not the individual tested is as capable 
of carrying on advanced college work as the student who has 
taken certain broad introductory or survey courses generally 
offered in the first two years of the liberal arts college, or has 





1U.S§. Armed Forces Institute, Tests of General Educational Development (Col- 
lege Level). Examiner's Manual. New York: American Council on Education, 
1944. 
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reached the same level of general educational development 


as the student who has had such survey courses.”? 


The present study was undertaken to discover whether the 
results of the tests were sufficiently valid for use with veterans 
who might seek admission to Harvard after the war. Specifi- 
cally the answers to three questions have been sought: 


1. Do the test results provide a basis for placing students 
in advanced standing at Harvard? 

2. Do they provide a sound basis for the selection of candi- 
dates for admission to Harvard? 

3. Can they be used in counseling the veteran on his choice 
of a field of concentration? 


Crawford and Burnham® made a study of Yale freshmen which 
would lead one to expect an affirmative answer to the second 
question. They found that the total of the standard scores on 
the A.F.I. Tests “correlated as well with Freshman first term 
averages in all courses, as did the average of all College Board 
Achievement Tests.”* 


Limitations of the Present Study 


With veterans actually seeking admission to colleges in in- 
creasing numbers, it is not possible to wait for the appropriate 
amount and kind of data to accumulate on the A.F.I. Tests 
before deciding to use them or not to use them. The data of 
the present study provide no final answers to the questions pro- 
posed, but it is hoped they may furnish some helpful clues use- 
ful to college administrative officers. 

The group studied at Harvard was composed of under- 
graduates whose educational careers, for the most part, had not 
been interrupted by military service. Their performance on 
the tests cannot, therefore, be considered as directly compara- 
ble to that of the college-minded veteran who not only will 

2 Op. cit., 

3A. B. Lae and P. §. Burnham. “Trial at Yale University of the Armed 
Forces Institute General Educational Development Tests.” Educational and Psycho- 
logical Measurement, IV (1945), 261-270. 

4 This comparison would have been more enlightening if the College Board’s 


Scholastic Aptitude Test scores had been averaged in with the Achievement Test 
scores. 
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have been away from formal classroom work for some time, but 
who will also have undergone experiences whose effect on his 
learning habits is, at best, difficult to predict. Furthermore, 
the Harvard group took the tests on a voluntary basis, moti- 
vated solely by patriotic considerations and the hope of receiv- 
ing one of a series of monetary prizes. Under these conditions 
it was not expected that the group would constitute a repre- 
sentative sample even of a normal civilian undergraduate popu- 
lation. Its incentives could hardly be considered similar to 
those of the returning veterans. 

There were 114 undergraduates who completed all four tests 
(civilian forms) and on whom there was the essential accessory 
information. The composition of this group is shown in Table 1. 


TABLE 1 


Distribution of the Tested Group According to Class Standing and 
Fields of Concentration* 








Non-Scientific Scientific 





Fields Fields Totals 
Freshmen ............. 29 30 59 
Sophomores ........... 7 10 17 
SORUMG eos 6 viaesivwnes 15 12 27 
BONITO ao io. cichc wei gebare 3% 6 5 11 
WIE sk cawloda kn vdinese 57 57 114 





* Since the field of concentration is not formally elected until the beginning of 
the sophomore year, the freshmen were assigned to “probable fields” on the basis of 
expressed preference. 


Two indices were available by which the general scholastic 
ability of the tested group could be compared to that of a nor- 
mal prewar class. The first of these was the Verbal Score on 
the College Entrance Board Scholastic Aptitude Test which is 
taken by nearly all students as part of the entrance examina- 
tion. The second was the college rank at the end of the fresh- 
man year. The college rank is reported in seven groups: Group 
1 represents a straight A record, and Group 7 represents an 
unsatisfactory record. Table 2 shows how the tested group 
compares with the Class of 1942 on these two indices. The 
tested group is relatively overweighted with students whose 
interests lie along scientific lines, but this fact should not seri- 
ously impair the results of the study if each of the groups is 
considered separately. There is, however, little question that 
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the tested group on the whole is sufficiently above average in 
scholastic ability to require that an allowance for the difference 
must be made in any general application of the findings. The 
allowance can be made with some confidence because of the fact 
that the range of ability in the tested group, as shown by the 
standard deviations, is not unlike that of the normal group. 
In other words, the tested group provides a reasonably good 
sampling of the less able as well as of the superior students. 
There is one further important limitation on the present 
study. Any findings related to the usability of the A.F.I. Tests 
for placing students in advanced standing will probably not be 
generally applicable to colleges where the program of study and 


TABLE 2 


General Scholastic Ability of the Tested Group Compared with That 
of the Class of 1942 











College Rank 
iad No. of Cases C.E.E.B. Verbal Score (Frechenan Year) 
Concen- Tested — Tested Class of Tested Class of 
tration Group 1942 Group 1942 Group 1942 

M o M M oo M oo 
Non-Scientific 57 624 640.5 91.8 574.8 92.0 3.77 1.36 4.44 1.37 
Scientific .... 57 265 637.2 103.6 551.3 90.0 3.95 1.76 4.18 1.45 
pL Geeaterares 114 889 638.9 97.9 567.8 92.0 3.86 1.58 4.36 1.40 





the system of promotion are unlike those at Harvard. The 
students in this study have not been exposed to so-called “sur- 
vey” courses. Ordinarily, a student in Harvard College, by the 
time he completes his sophomore year, takes at least one course 
in the natural sciences, one in the social sciences, and one in the 
humanities. He is free to select any one of a large number of 
courses in each of these areas. There is thus no guarantee that 
he will have obtained a broad acquaintance with the material 
in any given area. -The one exception to this rule is that practi- 
cally all freshmen are required to take a course in English com- 
position. 


Value of the A.F.I. Tests for Placement in 
Advanced Standing 


In order to determine whether it is necessary in this study 
to differentiate between freshmen and upperclassmen, it has 
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seemed advisable to look first into the question of whether the 
A.F.I. Tests show any relationship to the number of terms of 
work that a student has completed in college. In other words, 
do the tests measure educational development as it is conceived 
at Harvard? 

The representation from each class was so small and the 
difference in average scholastic ability among the several groups 
was so large that a simple comparison of mean scores from class 
to class would be all but meaningless. Therefore, in order to 
secure a reasonably adequate answer to the question of rela- 
tionship between the test scores and the amount of academic 
work completed, the method of partial correlations has been 
employed. Although partial correlations in the present in- 
stance will not provide an absolutely rigorous statement of the 
situation, it is believed that they provide a practical approxi- 
mation of the true picture. 

We wish to know the correlations between the A.F.I. Test 
scores and number of college terms completed when general 
scholastic ability is held constant. For this purpose, it is essen- 
tial that the measure of scholastic ability shall be based on 
evidence obtained before the student entered college. Such a 
measure is found in the Predicted Rank List Standing (PRL), 
an index computed routinely for every applicant to Harvard 
College. The PRL is a composite index based upon the appli- 
cant’s secondary-school class rank (hereinafter called School 
Rank) and his scores on the College Entrance Board exami- 
nations. It normally has a correlation of about .65 with Col- 
lege Rank at the close of the freshman year. It is expressed in 
the same terms as the College Rank, that is, a PRL of 1 repre- 
sents highly superior ability and a PRL of 7 represents inferior 
ability. 

Table 3 shows the partial correlations between the various 
A.F.I. Test scores and the number of terms completed in col- 
lege, with initial scholastic ability, as measured by the PRL, 
held constant.® 

5 The School Rank is converted to a standard score with mean of 85 and stand- 


ard deviation of 5. Complete tables of zero-order correlations will be found in Tables 
8, 9, and 10 at the end of this article. 














326 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


Using Guilford’s “Table of Significant Values of r, R, and 
t,”*® we find that two of the partial correlations in Table 3 can 
be considered statistically significant. For the Scientific Group, 
the partial correlation of .28 between the Social Studies Test 
and terms completed is above the five per cent level of confi- 
dence and that between Total Score and terms completed (.24) 
is above the one per cent level. None of the partial correla- 
tions found for the Non-Scientific Group is statistically signifi- 
cant. In other words, the present data suggest that, to a small 


996 


TABLE 3 


Partial Correlations between A.F.1. Test Scores and Number of Terms Completed in 
College, with Initial Ability Held Constant 











Non-Scientific Group Scientific Group 
(N =57) (N=57) 
r r 
Test 1 (Expression) ....... 03 05 
Test 2 (Social Studies) .... .17 .28 
Test 3 (Natural Science) .. 12 23 
Test 4 (Literature) ........ 14 01 
ROE wins Shas cae’ s .16 .24 





* Total Score was obtained by summing the standard scores on the four tests. 


extent, the A.F.I. Tests measure the educational development 
of students concentrating in science, but not of students con- 
centrating in social studies and humanities. However, in view 
of the small magnitude of even the significant relationships, the 
findings on the present group indicate that the A.F.J. Tests of 
General Educational Development cannot be used as a basis 
for placing students in advanced standing in Harvard College 
unless a fundamental change were made in the principles gov- 
erning promotion from one class to another. The value of the 
tests at colleges that offer specific courses in general education 
remains to be determined. 


Value of the A.F I. Tests for Selecting Students 


for Admission 
In view of the small relationships found between the A.F.I. 
scores and the number of terms completed in college, it was felt 


6 Guilford, J. D. Psychometric Methods. New York: McGraw-Hill Book 
Company, 1936. Pp. 548-9. 
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that in studying the results further, the factor of number of 
terms completed could be disregarded. When selecting stu- 
dents for college study, one ordinarily tries to find the measures 
or combination of measures that are most predictive of aca- 
demic success, where academic success is itself measured in 
some such terms as average freshman grades, grade-point aver- 
ages, and the like. Mention has been made above of the Pre- 
dicted Rank List (PRL) as the pre-admission index having the 
highest known correlation with College Rank at Harvard. 
Since the PRL is a composite of School Rank and College 
Entrance Board examinations scores, we shall combine the 
A.F.I. Test scores with School Rank in the same manner and 


TABLE 4 


Correlations with College Rank at the Close of the Term in Which the 
A.F.I. Tests Were Given* 








Non-Scientific Scientific 





Group ree Total Group 
(N= =37) (N=57) (N= 114) 
r r 
MM ered ane ahve a Ra ade.s sales 64 71 = 
Teal A. = he COD. Oss eee ss 41 52 46 
EO eee nee 62 63 62 
Total AFL +School Rankt .... 65 66 65 





* Certain of the correlation coefficients are technically negative, but the negative 
signs have been omitted to avoid confusion in meaning. 

The values in this row are multiple R’s. They are thus not strictly compar- 
able to the r’s obtained with the three other variables. That is, with a new sample 
one would expect shrinkage in the multiple R’s beyond that to be expected in the 
zero-order r’s. ; 
compare the two composites on the basis of the degree to which 
they predict the College Ranks that were assigned at the close 
of the term in which the A.F.I. Tests were taken. Table 4 
shows how the two composites compare with each other. 

It is apparent from Table 4 that, for the tested group, the 
composite of the A.F.I. Total score and School Rank compares 
favorably with the PRL in the prediction of College Rank. It 
will be observed that the School Rank factor, which is common 
to both of the composites, accounts for a relatively large pro- 
portion of the predictable variance in both cases. One cannot 
say how far this factor will be affected by the interruption in 
education that the returning veteran will have experienced. 
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Its predictive power will probably vary from one applicant to 
another. However, it is scarcely a measure that one would wish 
to discard altogether, since there is no reason to suppose that 
the predictive power of the tests—both A.F.I. and College 
Board—wiil not also suffer in a similar fashion and for many 
of the same reasons. 

Of some interest is the fact that while the PRL appears to 
give a slightly better prediction for the Scientific Group as com- 
pared to the Non-Scientific Group, the A.F.I.-School-Rank 
composite seems to predict the academic performance of both 
groups about equally well. The superiority of the PRL with 
science concentrators may well be due to the fact that the-Col- 
lege Board examination includes a test of mathematical apti- 
tude which is missing in the A 7.”. serié8!. Crawford and Burn- 
ham in their study at Yale congluded that the College Board 
Mathematical Aptitude Test .s “probably indispensable for 
scientific or engineering majors.” 

In general, the evidence from this portion of the study seems 
to indicate that the A.F.I. Tests are useful as an aid in selecting 
students capable of college work at Harvard. 


Value of the A.F I. Tests as a Basis for Guidance 


Do the A.F.I. Tests of General Educational Development 
provide the counselor with tools for advising the veteran with 
respect to his field of concentration? The data of the present 
study are inadequate to supply anything but the barest hint 
of an answer to this question. 

It is probably not wholly unreasonable to assume that the 
Non-Scientific Group has, on the average, more ability in the 
fields of its choice than the Scientific Group, and that the Scien- 
tific Group has, on the average, more ability in scientific sub- 
jects than the Non-Scientific Group. It remains to be seen 
whether these expected differences in ability are matched by 
differences in performance on the A.F.I. Tests. Table 5 pro- 
vides evidence on this point. 

One finds that the Non-Scientific Group consistently tends 
to surpass the Scientific Group in the tests ordinarily associated 


7 Op. cit., p. 268. 
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TABLE 5 
Comparisons of Mean A.F I. Scores Obtained by Two Groups of Concentrators 














= roy 

roup roup Cit. 

(N=37) (N=57) a 
Ratiot 

M;* ou, §$.D.a M:* ou, S.D.s M:i-Msz 

Test 1 (Written English) 98.8 1.26 9.4 96.7 124 93 +2.1 12: 2 

Test 2 (Social Studies)... 72.2 1.47 11.0 68.5 1.48 11.1 +3.7 19 .03 

Test 3 (Natural Science) 61.3 1.96 146 69.1 1.47 110 -7.8 3.2 .001- 

Test 4 (Literature) .... 66.8 1.34 10.0 63.5 129 96 +3.3 18 .04 





_ * Raw scores were used in the computation of the means and standard devi- 
ions, 
+ ashe standard errors of the differences between the means were computed by 
means of the formula: 


2 2 
eoste = V Ong + Om, 


with its field of interest, i.e., Written English, Social Studies, 
and Literature. With respect t& the Social Studies and Litera- 
ture tests, the differences between the two groups are statisti- 
cally significant, that is, the likelihood is less than five per cent 
that differences of this size would arise as a matter of chance. 
Similarly, the difference between the mean scores on the Science 
test is in the direction one would expect, and this difference is 
of such size that one would expect it to occur as a matter of 
chance less than once in a thousand times. 

The actual magnitude of these differences is, of course, not 
very large compared to the total range of the scores. However, 
if a student were to score relatively high on Tests 1, 2, and 4 
and low on Test 3, one might at least hazard a guess that his 
abilities were more like those of students studying in the broad 
area of social studies and humanities than like those of students 
pursuing the sciences as a major field of interest. 

As further evidence of the value of the A.F.I. Tests for 
guidance purposes, the degree of correlation of the tests with 
subsequent performance in actual fields of study is required. 
Unfortunately, the present data are not sufficiently numerous 
to provide unbiased criterion measures for each of the several 
fields. Nevertheless, it seems reasonable to suppose that the 
College Rank assigned to each student at the close of the term 
in which he took the A.F.I. Tests provides a rough measure of 
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TABLE 6 


Courses Taken by Two Groups of Concentrators in the Term When 
A.F.I. Tests Were Given 














Non-Scientific Group Scientific Group 
(N=57) (N=57) 
No. of Average per No. of Average per 
Courses Student Courses Student 

Natural Science ... 36 6 136 2.4 
Social Studies ..... 89 1.6 29 5 
Humanities ....... 114 2.0 81 1.4 
ci Sea 239 4.2 246 4.3 





performance in science for the Scientific Group and a similarly 
rough measure of performance in social studies and humanities 
for the Non-Scientific Group. The basis of this supposition is 
shown in Table 6. 

On the average, the College Ranks for students in the Non- 
Scientific Group were computed on the basis of 4.2 courses of 
which 3.6 were taken in the social studies and humanities areas, 
and College Ranks for the Scientific Group were computed on 
the basis of 4.3 courses, of which 2.4 were taken in the natural 
science area. Clearly, the College Rank does not provide a 
“pure” criterion measure, but correlations based upon it may 
be considered approximately indicative of the predictive power 
of the several tests. One would expect to find that the Science 


TABLE 7 
Correlations of Each of the A.F.D. Tests with College Rank 








Non-Scientific 


Group Scientific Group 








- (N =57) Critical 
(N=57) Ratio 
r* Zi id Z2 Z1— Ze 
Test 1 (Written English) .. .38 40 js ie 17 g 18 
Test 2 (Social Studies) .... .26 27 49 $4 -27 1.4 08 
Test 3 (Natural Science) ..  .29 .30 58 66 —.36 19 03 
Test 4 (Literature) ....... 48 52 34 = .35 17 9 18 





* The correlation coefficients are technically negative, but the signs have been 
dropped to avoid confusion in meaning. Each correlation has been converted to 
Fisher’s z-value for the purpose of securing an exact test of the significance of the 
differences. (See Fisher, R. A. Statistical Methods for Research Workers, New 
York, 1941. pp. 190 ff.) Since each of the z-values is based on the same number of 
cases, the standard error of z is a constant: .136. The standard error of the differ- 
ence between each pair of z-values is also a constant: .192. 
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test has a higher correlation with the College Rank of the Scien- 
tific Group than with that of the Non-Scientific Group; and 
that the other three tests have a higher correlation with the 
College Rank of the Non-Scientific Group than with that of the 
Scientific Group. Table 7 gives the correlations for each group. 

From Table 7, it is apparent that our expectations are borne 
out except in the case of the Social Studies Test. Here, for 
some reason, the Scientific Group correlation is higher than that 
of the Non-Scientific Group. It should be noted that this dif- 
ference, large as it is, is nevertheless not statistically significant. 
One reason for the low correlation obtained on this test for the 
Non-Scientific Group may be that the ceiling on the test is not 
sufficiently high for students interested in social studies. In 
other words, the test may not be able to differentiate so well 
among persons who are well read in the field as among persons 
for whom social questions are, on the average, a secondary con- 
cern. The present findings also suggest that the A.F.I. Social 
Studies Test may be useful with the science concentrators as a 
predictor of general academic performance. The same cannot 
be said for students with a non-scientific turn of mind. 

The one statistically significant difference in Table 7 is that 
between the two correlations involving the Natural Science 
Test. In this instance, the difference is in the direction ex- 
pected, and from the size of the correlation coefficient for the 
Scientific Group, it seems fairly clear that the A.F.I. Science 
Test has a genuine value for guidance purposes. 

As to the remaining two tests—Written English and the 
Interpretation of Literary Materials—the present study has 
produced no findings of any clear significance for prediction 
purposes. In a future study these two tests will be further 
investigated. 

Summary of the Findings 


The findings of this study are tentative and should be re- 
garded with caution. The nature and size of the sample used 
makes impossible any definitive statement regarding the value 
and proper use of the A.F.J. Tests of General Educational De- 
uelo+rvent. However, with specific reference to the group 
tested, the results of the study suggest the following: 
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TABLE 8 


Intercorrelations, Means, Standard Deviations 
Non-Scientific Group (N = 57) 











A.F I. Tests 

Terms Col- 

School PRL Test1 Test2 Test3 Test4 Total Com- lege 

pleted Rank 

ND oS Ce Pea 4l 26 25 36 40 ~=.28 62 
| ERE: Foie oie 61 59 58 66 73 34 © .64 
Ay ER ass eee Be 42 58 60 ae 3 38 
Peek 2 is eccks sbi ae 69 68 86 33 26 
i ee - 64 87 29 29 
(1 ae Sa ee at aes 86 32 48 
1 eer wt S atiae hat ah te ee 35 41 
Terms Completed .. ee bois: awah*\ ebwe a BERET. Jomars wire Al 
er Cee 912 36 988 72.2 61.3 668 288.3 24 3.5 
BOP: oth cswesewd 32. BT 94 110 146 100 222 2.0 1.5 





1. Although the A.F.I. Tests show a statistically significant 
relationship with the amount of work completed in Harvard 
College by science concentrators, the magnitude of the relation- 
ship is so small that the tests do not provide a sound basis for 
placing such students in advanced standing under the present 
system of promotion. 

2. The A.F.I. Tests show no significant relationship with the 
amount of work completed by non-science concentrators in 
Harvard College. These two findings, however, should be 
interpreted in the light of the fact that the tested group was 
not exposed to a curriculum in “general education.” 


TABLE 9 


Intercorrelations, Means, Standard Deviations 
Scientific Group (N = 57) 











A.FJI. Tests 
Terms. Col- 
School prt Test1 Test2 Test3 Test4 Total Com- lege 
Rank 
pleted Rank 
tie oe aes! a a ge gy ge ie 
S3) 78—30s—7 
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TABLE 10 


Intercorrelations, Means, Standard Deviations 
Total Group (N=114) 











A.F I. Tests 

Terms Col- 

School PRE Test1 Test? Test3 Test4 Total Com- lege 
7 pleted Rank 

School Rank .... ... ee 39 40 36 36 49 30 62 
DEK Ss oats 65-3 4:5-6.8 a ss 56 63 61 56 74 31 .67 
A Reape ee ae esis sia 46 46 52 73 23 31 
EE iieias ¢-ais' eer, mn sates telean 63 58 .86 St 39 
Rs - ies Samia.” | pwtektes Gasset 49 .80 28 35 
EE ee ses wie Aer Me eee Oem e Ye 79 25 42 
DR ee iy ain ict ROE eee ae AR or aes OY Oe Te ee 36 46 
— Completed ... Palank. ty @euaed  Celalnd i ware he ee pores 35 
or eee 912 3.5 97.8 704 65.2 65.2 287.3 2:3 K 4 
RD. peaeeae ute 3.3 1.0 94 112 13.5 100 208 1.0 1.7 





3. The total score of the A.F.I. Tests when used in combi- 
nation with the student’s school rank should provide a reason- 
ably good prediction of his subsequent academic success in 
College. 

4. For the tested group, the A.F.I. Tests appear to measure 
“general aptitude” rather than “general educational develop- 
ment.” 

5. On the average, the score patterns yielded by the A.F.I. 
battery appear to differentiate slightly the students interested 
primarily in the social studies and humanities from those inter- 
ested in the natural sciences. 

6. The A.F.I. Test in the Social Studies may be _" with 
students of scientific bent as a predictor of general academic 
ability and development. 

7. The A.F.I. Test in the Natural Sciences provides a useful 
instrument for predicting college success in the sciences. 




















A STUDY OF THE FACTOR STRUCTURE OF 
THIRTEEN PERSONALITY VARIABLES 


CONSTANCE LOVELL 


University of Southern California 
Introduction 


THE purpose of this study was to make a factor analysis of 
the thirteen variables of personality measured by Guilford’s 
Inventory of Factors STDCR, the Guilford-Martin Inventory 
of Factors GAMIN, and the Guilford-Martin Personnel Inven- 
tory I. 

These inventories were constructed to measure those per- 
sonality characteristics which previous factor analysis and 
clinical work had indicated as important.? The original studies 
showed that the thirteen factors were not completely indepen- 
dent of each other though they were sufficiently separate to 
make individual scores helpful. The present study has in- 
volved factor analysis of the correlations found between them 
for the purpose of determining the clusters into which they fall. 
In other words, it has been designed to investigate the nature 
of more generalized super-factors with which the specific and 
interrelated original factors are loaded. Because it is based on 
intercorrelations, it has involved giving all three inventories to 
a single group, consisting of 200 college students. 


The Inventories 


These three inventories provide measures of the following 
factors: 


1The writer wishes to express her gratitude to Lt. Col. J. P. Guilford for his 

helpful suggestions concerning this research. 
P. Guilford and R. B. Guilford. “Personality Factors S, E, and M, and 

Their Measurement,” Journal of Psychology, II (1936), 107-127; “Personality Fac- 
tors D, R, T, and A.” Journal of Abnormal and Social Psychology, XXXIV (1939), 
21-36; ‘ ‘Personality Factors N and GD,” Journal of Abnormal and Social Psychology, 
XXXIV (1939), 239-248. 

C. I. Mosier. “A Factor Analysis of Certain Neurotic Tendencies,” Psycho- 
metrika, II (1937), 263-287. 
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S—Social Introversion-Extraversion (sociability, tendency to 
seek social contacts and to enjoy the company of others 
as against shyness, tendency to withdraw from social situ- 
ations and to be seclusive). 

T—Thinking Introversion-Extraversion (lack of introspec- 
tiveness and an extravertive orientation of the thinking 
process in contrast to an inclination to meditative think- 
ing, philosophizing, analyzing oneself and others, and an 
introspective disposition ). 

D—Depression (freedom from depression and possession of a 
cheerful, optimistic disposition versus a chronically de- 
pressed mood and possession of feelings of unworthiness 
and guilt). 

C—Cycloid disposition (stability of emotional reactions and 
moods and freedom from cycloid tendencies in contrast to 
strong emotional reactions, fluctuations in moods, and a 
disposition toward flightiness and instability). 

R—Rhathymia (a happy-go-lucky or carefree disposition, 
liveliness, and impulsiveness as against an inhibited dis- 
position and an over-control of the impulses). 

G—General activity (tendency to engage in vigorous overt 
action versus a tendency to inertness and a disinclination 
for overt activity). 

A—Ascendance-Submission (social leadership vs. social pas- 
siveness ). 

M—Masculinity-femininity (masculinity of emotional and 
temperamental make-up versus femininity of make-up). 

I—Inferiority feelings (self-confidence and lack of inferiority 
feelings as against lack of confidence, under-evaluation of 
one’s self, and feelings of inadequacy and inferiority). 

N—Nervousness (tendency to be calm, unruffled, and relaxed 
in contrast to jumpiness, jitteriness, and a tendency to be 
easily distracted, irritated, and annoyed). 

O—Objectivity (tendency to view one’s self and surroundings 
objectively and dispassionately versus a tendency to take 
everything personally and subjectively and to be hyper- 
sensitive). 

Co—Cooperativeness (willingness to accept things and people 
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as they are and a generally tolerant attitude as against 
overcriticalness of people and things and an intolerant 
attitude). 

Ag—Agreeableness (lack of quarrelsomeness and a lack of 
domineering qualities in contrast to a belligerent, domi- 
neering attitude and an overreadiness to fight over trifles). 


In the construction of the inventories, the following general 
procedure was used.* Items were formulated which appeared 
to be diagnostic of each of the thirteen aspects of personality as 
defined by the previous work done. These were stated in ques- 
tion form, to be answered by “Yes,” “?,” or “No.” Preliminary 
scoring keys were set up on the basis of the best statistical evi- 
dence at hand. The questions were administered to groups of 
subjects (e.g., 500 employed individuals in the case of the Per- 
sonnel Inventory I). After the papers were scored with the 
preliminary keys the test of internal consistency was applied 
to every item. Those items which were not sufficiently diag- 
nostic were discarded. For the remaining items scoring weights 
were assigned in accordance with a method devised by Guil- 
ford.* 

Because of the possibility of faking answers to items the 
value of personality inventories has been questioned. Probably 
no satisfactory answer can be given without consideration of 
the purpose for which the inventories are used. Administering 
inventories to a group of prospective employees, who know that 
their chances of work depend on their responses, may be ex- 
pected to yield different results from those obtained in a situ- 
ation where individuals are motivated by the desire to gain 
additional information about their personalities. 

Several investigations have been made in which students 
have been asked to take inventories twice.’ In one situation 


8 The description of the construction of the inventories has been adapted from 


the discussions found in the manuals for the three inventories. 
4J. P. Guilford. “A Simpie Scoring Weight for Test Items and Its Reliability,” 
Psychometrika, IV (1941), 367-374. 

5 For example, R. G. Bernreuter, “Validity of the Personality Inventory,” Per- 
sonnel Journal, XI (1933), 383-386; C. Dowling, “Ability of College Students to 
Influence Scores on the Guilford-Martin Personnel Inventory,” unpublished research 
study, The University of Southern California, 1944; J. A. M. Kimber, “The Insight 
of College Students into the Items of a Personality Test,” unpublished doctor’s disser- 
tation, The University of Southern California, 1945; F. L. Ruch, “A Technique for 
Detecting Attempts to Fake Performance on the Self-inventory Type of Personality 
Test,” Studies in Personality, New York: McGraw-Hill Book Company, 1942. 
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they have been requested to respond according to the way they 
think they are; in the other, according to the way they would 
like to be, the way they think a well-adjusted individual would 
respond, or the way they think a good employee would respond. 
Such studies have revealed consistently a difference in scores 
between the two situations. The results show that responses 
can be influenced in a given direction but they also give an 
indication that students do not answer the items, under the 
ordinary procedure of administration, so as to present the best 
possible picture of themselves. Inasmuch as the present study 
was conducted in a manner similar,.to the “normal” condition 
in the above investigations, it is probably not unreasonable to 
assume that a similar attitude toward the inventories was 
present. 

Such findings, of importance in relation to the matter of 
faking answers to items, are of course not decisive evidence of 
the validity of the inventories. More direct information has 
been obtained. In one study,‘ inventory scores for factors S, T, 
D, C, and R were correlated with self-ratings and with ratings 
by close associates. The reliabilities of the ratings for T and C 
were not sufficiently acceptable as criteria against which to 
validate scores for those factors. For S, D, and R, the correla- 
tions were high enough to indicate that the inventory scores 
were quite valid. 

In another study, the validity of factor M was checked by 
comparing the distributions of the inventory scores of 50 males 
and 50 females not used in the original standardization group.’ 
Forty-six of the males were above the median of the distribu- 
tion of the scores of the two sexes combined and forty-six of the 
fifty females were below the median. The validity coefficient 
(phi) for the factor was .84. It was considered highly satis- 
factory in view of the fallibility of biological sex as a criterion 
of masculinity-femininity as a temperamental trait. 

With the Personnel Inventory I, a study was made in which 
workers were classified into a “satisfactory” group and an “un- 


6 J. P. Guilford and Howard Martin. “Age Differences and Sex Differences in 
Some Introvertive and Emotional Traits,” Journal of General Psychology, XXXI 
(1944) , 219-229. 

7 Description of this study is taken from the manual of directions for the inven- 
tory. 
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satisfactory” group on the basis of test results.* The inventory 
was taken under conditions in which the subjects were informed 
that their employment status would not depend on the results. 
Of 22 workers judged unsatisfactory by management, 68% 
were detected by the test. Of 26 workers judged satisfactory 
by management, 73% were correctly placed by the test. Other 
studies have yielded results in line with this one.®° The authors 
of the inventory have pointed out that, in these preliminary 
studies, selection of unsatisfactory individuals was made in 
terms of arbitrary criteria and that more detailed study of the 
jobs in question might have led to the use of different cut-off 
points and greater success. In the manual of directions they 
urge that, for usage of this sort, critical scores be based on 
experience in the specific situation. 

Reliability coefficients for the inventories have been given 
in the manuals. They were computed by dividing the scored 
items for each factor into two random halves, computing Pear- 
son coefficients of correlation, and then estimating reliability 
coefficients by means of the Spearman-Brown formula. The 
reported reliability coefficients are as follows: S = .90, T = .84, 
D = .94, C = .88, R = .90, G = .89, A = .88, M = .85, I= .91, N =.89, 
O = .83, Ag = .80, Co = .91. 


Procedure 


The three personality inventories were administered, accord- 
ing to the directions in the manuals, to four elementary psychol- 
ogy classes at The University of Southern California. Before 
the inventories were given out an appeal for cooperation in 
securing accurate responses was made. ‘The students were 
informed that they would be given their results individually 
and that the scores they made would have no influence on their 
grades in the course. 

Two hundred and thirteen subjects completed all three in- 
ventories. They were divided, according to sex and nearest 
age, as follows: 

8R. M. Dorcus. “A Brief Study of the Humm-Wadsworth Temperament Scale 
and the Guilford-Martin Personnel Inventory in an Industrial Situation,” Journal of 
—— Psychology, XXVIII (1944) , 302-307. 


. G. Martin. “Locating the Troublemaker with the Guilford-Martin Person- 
nel Inventory,” Journal of Applied Psychology, XXVIII (1944), 461-467. 
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Age: 15 20 25 30 35 40 45 50 
Number of Men: 3.401: 2:47 D>oO 
Number of Women: 4 71 3 4 2 1 0 2 


Of these cases those whose nearest age was 15 and those whose 
nearest age was 35 or above were dropped. This selection left 
200 cases: 122 men and 78 women. It was made because those 
at the extremes in age might give atypical results for college 
students and because the loss of such a small number would 
make no appreciable difference as far as statistical significance 
was concerned. 

Raw scores for each of the factors were determined for each 
subject. These were converted into scaled scores by means of 
the conversion tables in the manuals. The scaled scores (C 
scores) were originally set up on the groups used in standardi- 
zation to normalize the distributions for the various factors.’ 

Intercorrelations of each factor with every other factor were 
then computed, using the Pearson product-moment method. 
These intercorrelations are given in Table 1. Sixty-five of 
them were significant (at the 5% level), being .140 or greater. 
Sixty-two of them, .182 or greater, were very significant (at the 
1% level). 

The Thurstone method of factor analysis was used. Cen- 
troid factors were extracted according to the procedure given 


TABLE 1 


Intercorrelations of Factor Scores 








Bitte OD Ses Res 2 As I AN 9D" Ag. Bo 





S . 423 638 .439 .655 .379 .733 .101 .591 .384 .465 .140 .222 
yy . we. 645 .588 .300 -.070 .197 .212 .335 .391 .405 .169 .237 
D . 901 .228 -.040 .481 .315 .740 .710 .746 .337 442 
bs ... —.021 —.188 .308 .330 .675 .701 .722 .351 .416 
R woe = wwe SD =—.525 089 «2.270 32.079 ~=—.207 — 084 —.019 
G b> ... .438 —.067 .088 —.231 -—.059 —.314 -.169 
A . 256 .570 .325 .460 .001 .200 
M .-- 326 .348 .365 .006 .210 
I 674 .746 .350 .448 
N 720 470 .529 
O 495 616 
Ag 

Co 


631 





10 J. P. Guilford. Fundamental Statistics in Psychology and Education. New 
York: McGraw-Hill Book Company, 1942. Pp. 104-106. 
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by Guilford.* In estimating communality the highest corre- 
lation in each column was used. It was decided to continue 
extraction as long as the range of factor loadings was at least 
-.20 to +.20. This criterion called for the extraction of six 
factors.” In the following discussion these are consistently 
called “super-factors” to emphasize their distinction from the 
thirteen original inventory factors. 

With these the communalities for the thirteen factors were 
found by computing the sum of the squares of the super-factor 
loadings for each. Comparison of these with the communali- 
ties estimated at the beginning of the analysis revealed one 
difference of .145, which was considered too large to be toler- 
ated. Accordingly a second set of extractions was made using 
the communalities obtained from the super-factor loadings of 
the first extractions. This time the largest discrepancy be- 
tween the estimated and obtained communalities was .038, 
which was considered well within the limits of toleration. 

TABLE 2 


Centroid Super-factor Loadings and Communalities from Second Extraction 











Super-factor loading Obtained —_ i 
Factor —_ iscrep- 
commu- = ancy 


I SW Men VI nality “nality 








761 -.477 -.166 -.092 .226  .047 896 876 .020 
560 .109 -.521 -.123 -.198  .060 655 617 .038 
896 197 -.292 .132 .076 —.057 953 969 016 
780 =.423 ~-.280 3=— 2603S 089 -.125 957 .968 O11 
433 -.665 -.077 -.182 -.187 .084 711 698 013 
119 -.740 054 076 -.116 -.242 643 .620 .023 
668 -.498 190 .195 081  .114 .788 812 024 
<g50' 455; 0G: © 212 = 270? 227 .360 342 018 
828 050 .155 .180 .122 -.023 .760 .759 .001 
J90. 319 O76. 039 O87: .166 .728 743 015 
846 253 .188 014 -.062 -.057 822 827 .005 
404 445 217 -.472 .188 -.116 .680 657 .023 
558 .367 .349 -.316 -.024 -.068 .673 .670 .003 


OPozZ>arnvne 





11J. P. Guilford. Psychometric Methods. New York: McGraw-Hill Book 


Company, 1936. Pp. 478-488. 

12 Comparison of the standard deviations of the residuals with the standard error 
of the average correlation indicated that not more than three factors should be ex- 
tracted. However, it is considered only a rough test. Coombs’ criterion gave incon- 
sistent results. Tucker’s criterion (revised) indicated that at least seven factors 
should be extracted. Because of the inconsistency of these results it was decided to 
continue extraction as long as the range of loadings was —.20 to +.20. Beyond that 
point (with a maximum contribution to communality of less than .04) it did not seem 
advisable to go. 
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The centroid loadings from this analysis, used in the rota- 
tions which followed, are given in Table 2 together with the 
obtained communalities, the estimated communalities, and the 
discrepancies. 

Rotation of the axes was made graphically, according to the 
procedure given by Guilford.** The aim was to minimize the 
size and number of negative entries and to maximize the num- 
ber of vanishing entries.* Rotation was continued until no 
further improvement according to these criteria could be ob- 
tained. The super-factor loadings and communalities from 
the final rotation are given in Table 3. 


TABLE 3 


Super-factor Loadings and Communalities after Rotation 








Super-factor loading 








Commu- 
Factor : 

I II Il IV Vv VI nality 
S 704 860.085 (i2—(itiC(itéi‘a aS SSCtié«* 8966 
T — © Sh Ce lH ‘6526 
D 233 438i‘ SC (ité«i*kC(ié«zSSCt«i«*iGD 9498 
Cc 017-480 «841 «= 059-080 ~—_ 090 9560 
R a-ak ae a 7116 
G 73-91 -17% -234 000 088 6385 
A "704 383 «©6090 s«sOOsi‘“‘z wk SSCt«*KOD 7895 
M ~017. 584 06 -.050 096 038 "3589 
I 377) 537, 8S 2 0902—S—«i28S "7596 
N OS 502 48 351 065 258 "7259 
O 48 46601 |= A7h—(‘i‘iwAS: «SsiCOCi‘«‘SS 8196 
Ag -082 (060 330 .748 010 012 ‘6789 
Co pss 66387, (ss 2250—is atest 2D 6769 





Two negative loadings remained after the final rotation. 
Trait G had a loading of —.170 on Super-factor III and a load- 
ing of — .234 on Super-factor IV. In view of the fact that this 
factor had four significant negative correlations with other fac- 
tors, failure to achieve a positive manifold through rotation is 
not unreasonable. Negative loadings of G on these super-fac- 
tors, moreover, fit logically into the interpretation given to 
them.*® 

18 J. P. Guilford, op. cit., 489-491 and 502-507. 

14 Loadings of +.3 and above or of —.3 and below were considered significant in 
naming super-factors; those from + .11 to + .29 and from —.11 to —.29 were considered 
as different from zero but too small to be important in identification; and those 


between +.10 and —.10 were regarded as vanishing. 
15 See data describing Super-factors III and IV. 
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Interpretation of Results 


Listed below are the loadings of the thirteen factors on 
Super-factor I, in order from highest to lowest: 


Factor Loading Data describing Super-factor I 
G 734 Has tendency to engage in vigorous overt action 
R Bi si Happy-go-lucky, lively, impulsive, uninhibited 
S 704 Sociable, has tendency to seek social contacts and 
to enjoy company of others 
A .704 Tends to be social leader 
I 377 Self-confident 
O .248 Objective 
D .233 Cheerful, optimistic 
x i 084 May or may not be introspective 
Co 058 Either tolerant or intolerant 
C 017 May have stable or unstable emotional reactions 
N .003 Either relaxed or nervous 
M -.017 Either masculine or feminine in emotional make-up 
Ag — .082 May or may not be quarrelsome and domineering 


This super-factor has been identified tentatively as a drive- 
restraint variable. Those factors with sizable loadings on it 
appear to have in common an active approach to experience. 
The person with high scores on them tends to engage in vigor- 
ous overt action, to give relatively uninhibited expression to 
impulses, to seek social contacts, and to be a social leader. This 
super-factor gives the contrast between the individual who 
pushes out into activity as against the person who has to be 
forced into it. 

The other positive loadings, though not high enough for use 
in naming the super-factor, are in agreement with the identifi- 
cation made. One might expect that drive for response would 
tend to be accompanied by feelings of confidence in and opti- 
mism about reactions made, and that pressure for response 
might prevent an individual from becoming prey to hyper- 
sensitive reactions. 

Moreover, the vanishing loadings seem in accord with the 
identification. It appears logical to think of drive for response 
as being independent of degree of tolerance, emotional stability, 
nervousness, masculinity of make-up, and domineering ten- 
dency. The only loading that is difficult to fit into this picture 
is the vanishing one on T. However, if one thinks of T in terms 
of the differentiation it makes between extravertive and intro- 
vertive orientation of the thinking process rather than in terms 














344 EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


merely of tendency toward meditation, the vanishing loading 
seems more reasonable. 
Below, in order of size, are the loadings on Super-factor II. 









Factor Loading Data describing Super-factor II 
O 601 Objective 
M 584 Has masculine attitudes 
N 542 Calm, unruffled, relaxed 
I 537 Self-confident 
Cc 480 Has stable emotional reactions 
D 438 Cheerful, optimistic 
A 383 Is social leader 
Co 357 Tolerant 
a 181 Lacks introspectiveness 
S 085 May or may not be sociable 
Ag .060 May or may not be quarrelsome 
R 045 May or may not be happy-go-lucky 
G -.091 May or may not have tendency to engage 


in vigorous overt action 


This super-factor has been tentatively named a realism vari- 
able. The inventory factors with high loadings on it present 
a good picture of the impersonal and dispassionate realist. He 
views things objectively. He does not go to pieces at seeing a 
fish on a hook. He is calm, unruffled, and self-confident (for 
he is objective enough to know that his bad points aren’t his 
whole personality). Also, because of his objective and imper- 
sonal approach, he tends to have stable emotional reactions and 
not to become unduly depressed by passing disappointments. 
One might expect that such an individual might have some 
tendency toward tolerance and leadership, though the rela- 
tively low loadings of these factors are not unreasonable in light 
of the identification. The vanishing loadings present a logical 
addition to the description of this super-factor. It seems rea- 
sonable to think of this characteristic of realism as being inde- 
pendent of degree of sociability, impulsiveness and carefreeness, 
tendency to engage in vigorous overt action, and tendency 
toward quarrelsomeness. 

This super-factor presents a fairly good picture of reported 
sex differences in personality except for the low loading of social 
leadership, which is supposed to be more characteristic of men 
than of women. However, it seems preferable to name the 
variable in terms of the attitudes and reactions it involves 
rather than to call it simply “masculinity-femininity.” 
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ng The loadings on Super-factor III, in order, are as follows: 
Factor Loading Data describing Super-factor III 
iT. C 841 Has stable emotional reactions 
D 813 Is cheerful, optimistic 
‘4 625 Lacks introspectiveness 
N 488 Is calm, unruffled 
O 471 Objective 
I 445 Self-confident 
S 422 Sociable 
Ag 330 Lacks quarrelsomeness 
Co .250 Tolerant 
A .090 May or may not be a social leader 
M .066 May or may not have masculine attitudes 
R - .027 May or may not be carefree vd 
G -.170 Less active than the average person 


This super-factor has been defined tentatively as an emo- 
tionality variable. At the low extreme on it would be the 
individual characterized by hampering emotional excess. At 
the other extreme (as indicated by the high loadings) would be 
om found the individual who is dependably cheerful and opti- 
mistic, free from constant analysis of himself and others, with 


val some tendency to be (1) free of nervous habits, (2) lacking in 
or vere . 

™" hypersensitivity, (3) self-confident, sociable, and tolerant, and 
el (4) lacking in domineering qualities. Such an individual might 


di or might not be a leader in social situations, masculine in his 
attitudes, and uninhibited. It is logical to think that he might 


wa have some tendency to be a “slow mover,” since a person with 
af great drive for activity would be likely to get into more up- 


* setting situations. However, the negative loading on G is not 
large enough to merit much consideration in the naming of this 


al 
meh super-factor. 
i For Super-factor IV, the following loadings were found: 
3S, Factor Loading Data describing Super-factor IV 
> Ag .748 Lack of quarrelsomeness and domineering qualities 
y Co -«693.—~SSs« Tolerant 
O 415 Objective 
ad N 351 Calm, unruffled, relaxed 
al I .240 Self-confident 
. D 089 May or may not be optimistic 
on S .060 May or may not be sociable 
h c 059 May or may not have stable emotional reactions 
e A .001 May or may not be a social leader 
es M -.050 May cr may not have masculine attitudes 
5 Y — .053 May or may not be introspective 
R - .061 May or may not be carefree 
G — .234 Less active than the average person 
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This super-factor has been identified tentatively as a social 
adaptability variable. The factors with high loading on it seem 
to present a picture of the individual whose actions are influ- 
enced by the desire for smooth relations with others. He does 
not domineer over others or quarrel with them; he is tolerant 
of others’ beliefs; and he is objective in his interpretations (such 
objectivity being necessary for smooth adjustment to other 
people). Perhaps because such a person adapts himself to 
others easily he tends to be calm and relaxed and to be self- 
confident. 

The vanishing loadings fit readily into this picture. The 
person who is concerned with adapting himself to the responses 
of others may or may not be (1) cheerful, (2) desirous of going 
out of his way to seek social contacts, (3) high in leadership 
qualities (he might be either a good leader or a good follower), 
(4) masculine in attitudes, (5) introspective, (6) impulsive, 
or (7) stable in mood. Further, one might expect that such 
a person would tend to have rather low pressure for overt activ- 
ity, since it would make for fewer chances of disagreement with 
others. 

Super-factor V had the following loadings: 


Factor Loading Data describing Super-factor V 
5 i 465 Lacks introspectiveness 
R 441 Carefree, impulsive 
S .245 Sociable 
M 096 May or may not have masculine attitudes 
D .093 May or may not be cheerful 
N 065 May or may not be calm and relaxed 
A 047 May or may not be social leader 
Co .040 May or may not be tolerant 
oO 018 May or may not be objective 
Ag .010 May or may not be domineering 
G .000 May or may not be vigorous in action 
i — .080 May or may not have stable emotional reactions 
I — .092 May or may not be self-confident 


Below are the loadings for Super-factor VI: 


Factor Loading Data describing Super-factor VI 
S 390 Sociable 
A 370 Social leader 


N .258 Calm, relaxed 
I 255 Self-confident 
D 162 Cheerful, optimistic 
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& .090 May or may not have stable emotional reactions 
R 072 May or may not be carefree, lively 

O 051 May or may not be objective 

M .038 May or may not have masculine attitudes 

Ag 012 May or may not be domineering 

Co — .042 May or may not be tolerant 

t — .056 May or may not be introspective 

G — .088 May or may not have tendency toward vigorous 


overt action 


These two super-factors are too weak to be of any impor- 
tance. Both are merely doublets, accounting for special corre- 
lations (in addition to the influence of the other super-factors) 
between S and A and between T and R. 

One finding of particular interest in this study is the fact 
that no very general super-factor was located which might be 
called “tendency to give the desirable response” or “insight into 
the desirability of the response.” Opinion as to just what score 
on these thirteen factors a very well adjusted person should 
possess would vary somewhat from individual to individual. 
However, probably most persons would agree with the authors 
of the inventories that the following scores are desirable: high 
scores on S, D, C, A, I, N, O, Co, and Ag; middle scores on T, R, 
and G; and a score on M depending on sex. If individuals were 
answering the items in terms of their insight into the desira- 
bility of the items, one would expect to find a super-factor in 
which S, D, C, A, I, N, O, Co, and Ag had sizable loadings. 
Nothing approaching this was found. Apparently understand- 
ing of the desirability of certain responses did not have a 
marked influence on results. This finding is in line with the 
material cited early in the report concerning normal and special 
methods of administering inventories. 

The results of this study present interesting suggestions con- 
cerning the structure of personality. On the basis of the find- 
ings one may conceive of personality as consisting of hierarchies 
of habit systems of different degrees of independence and gen- 
erality. The smallest units are the habit systems tapped by 
individual items of the inventories. Many of them are inter- 
correlated. They fall into clusters because they have in com- 
mon some more general characteristic. These characteristics 
are not only less specific but are on the average more indepen- 
dent of each other. (Such are the thirteen factors measured by 
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these three personality inventories.) They, in turn, are inter- 
correlated to a certain extent. They fall into certain clusters 
because of even more general factors they have in common. 
These super-factors are more separate from each other on the 
average than the less general-habit systems. 

More particularly, this study has indicated the following 
four general habit systems: drive, emotionality, realism, and 
social adaptability. In an orthogonal structure such as this, 
a person may stand at any position on the scale for any of these 
four factors. He might, for example, be high in social adapta- 
bility, low in realism, low in emotionality, and average in drive. 
A person with a moderately high score on social adaptability 
would tend to score high on both tolerance and agreeableness 
(the more specific habit systems which have this characteristic 
in common), because the two are positively correlated. How- 
ever, these correlations are low enough so that, in individual 
cases, there might be considerable disparity between standings 
on the two. Therefore separate scores for each are indicated. 
These, of course, are the factor scores from the inventory. 

For a more concise and more generalized picture of an indi- 
vidual’s personality than that provided by the thirteen factor 
scores one would want measures of the four super-factors. 
Equations for predicting such scores from the thirteen C scores 
have been set up using the Doolittle method. In this process 
an arbitrary mean (50) and an arbitrary standard deviation 
(10) for the super-factor scores have been assumed. More- 
over, only those traits with super-factor loadings of .5 or above 
have been used. These prediction equations are as follows:’° 

I =28.370+1.78G + .682R +.851S +.891A. 
II = 36.101+ .8040 +1.202M+.349N +.299I. 


IIT = 30.894 +2.273C + .745D +.569T. 
IV = 33.967 + 1.805Ag + 1.339Co. 


In addition to the above, one further result should be men- 
tioned. The original studies made by Guilford indicated that 
factors D and C were sufficiently independent to warrant sepa- 
rate measurement. Items were then constructed which ap- 
peared to be measuring each. Obviously these items were not 


16 The following multiple correlation coefficients were obtained: R:-orea = .888; 
Ru-ounr=.729; Riu-cpr=.860; Riv-agoo=. 
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pure measures, for the correlation obtained in this study for the 
scores on the two factors was .90. This indicates that addi- 
tional work on these two sets of items is necessary to bring the 
correlation between scores on the inventory closer to the corre- 
lation of the factors themselves, as found in the original re- 
search. There were a number of other correlations in the 
seventies. These were for factors in separate inventories for 
which no correlations are available. It might be possible to 
lower these somewhat by the removal of impure items. How- 
ever, in view of the general interpretation of the results of this 
study, one would not expect to eliminate all correlation even if 
perfectly pure items for each factor were used. And, as they 
stand now, the correlations are not high enough to enable accu- 
rate prediction of one factor from the other. 

The results as given are, of course, limited by the selection 
of subjects and the procedure used in the study. Generaliza- 
tion of these findings for college students to all individuals is 
not warranted. Further research, set up in similar form, should 
be done with non-college groups as subjects. In addition, the 
findings would be expected to apply only in cases where the 
inventories were given under the conditions of administration 
used in this investigation. One would predict different results, 
for example, if subjects were asked to take the inventories so 
as to indicate how a happy, well-adjusted person would re- 
spond. Factor analysis of scores obtained with such a pro- 
cedure would make an interesting study. 


Summary 


The purpose of this study was to make a factor analysis of 
the thirteen variables of personality measured by Guilford’s 
Inventory of Factors STDCR, the Guilford-Martin Inventory 
of Factors GAMIN, and the Guilford-Martin Personnel Inven- 
tory I, 

The three inventories were administered to two hundred 
college students under standard conditions. The results ob- 
tained in the study are, of course, limited by these selective 
factors. 

For each of the subjects scaled scores were obtained on the 
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following factors: sociability, extravertive orientation of the 
thinking process, freedom from depression, stability of emo- 
tional reactions, carefreeness, general drive, social ascendance, 
masculinity, freedom from inferiority feelings, freedom from 
nervousness, objectivity, lack of quarrelsomeness, and toler- 
ance. Intercorrelations between the scores were then computed 
and a factor analysis of the results was made, using the Thur- 
stone method. Six super-factors were obtained. The first four 
were identified tentatively as: 


I. Drive-restraint (high loadings on general drive, care- 
freeness, sociability, and social ascendance). 

II. Realism (high loadings on objectivity, masculinity, 
freedom from nervousness, and freedom from inferi- 
ority feelings). 

III. Emotionality (high loadings on stability of emotional 
reactions, freedom from depression, and extravertive 
orientation of the thinking process). 

IV. Social adaptability (high loadings on lack of quarrel- 
someness and tolerance). 


The remaining two super-factors were doublets, accounting for 
special relations between two pairs of factors. No super-factor 
which seemed to involve insight into the desirability of the 
responses or tendency to give “good” responses was found. 

For the subjects used the results seem to picture the struc- 
ture of personality as consisting of four general areas, relatively 
independent of each other, within which lie less general habit 
systems (less independent, but sufficiently so to make sepa- 
rate scores advisable for diagnostic work). Equations for pre- 
dicting scores on the four super-factors from the thirteen factor 
scores were set up using the Doolittle method. 

The high correlation between scores on C (stability of emo- 
tional reactions) and D (freedom from depression) indicated 
the advisability of revision of these two sets of items to bring 
them closer to the correlation originally found between the 
factors themselves. 
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THE OFFICE OF RADIO RESEARCH:? A DIVISION OF 
THE BUREAU OF APPLIED SOCIAL RESEARCH, 
COLUMBIA UNIVERSITY 


MARJORIE FISKE ano PAUL F. LAZARSFELD 
Office of Radio Research 


TuE rapidly increasing development of the radio industry 
in the past two decades has opened up a new and increasingly 
important area of communications research. Radio is accessi- 
ble to more people than any other kind of communication. Its 
effects are therefore of increasing importance to the sociologist, 
the educator and the politician. And since radio in America 
is a privately owned and operated industry, its impact is also 
a matter of importance to networks, advertisers, advertising 
agencies and others concerned with its commercial effective- 
ness. 

The Office of Radio Research has been functioning for seven 
years, during which it has been concerned with a wide variety 
of radio research problems. In some instances these problems 
could be solved by a relatively simple adaptation of techniques 
used in other fields of research. In others it was necessary to 
develop new techniques to fit the special characteristics of this 
relatively new medium. It will not be possible within the 
scope of this chapter to enumerate all such adaptations and 
innovations. We shall, rather, touch upon a few which illus- 
trate the interrelationship of radio and other fields of communi- 
cations research.” 


The Objectives of Radio Research 


It is manifestly impossible to study either the content or 


the effect of every radio broadcast that goes on the air. The 


1 This article is a chapter of a book, How to Conduct Consumer and Opinion 
Research, edited by A. B. Blankenship, which is to be published by Harper and 
Brothers early in 1946. 

2 The authors are indebted to Dr. Bernard Meyers for his assistance in organ- 
izing this material. 
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larger body of knowledge, the over-all picture, has to be built 
up over a period of time, segment upon segment, each segment 
representing a study of a particular program or a particular 
group of listeners. Furthermore, studies of particular pro- 
grams or particular groups of listeners may be done from the 
different standpoints of the educator, the politician, the soci- 
ologist, the psychologist and the businessman, each seeking 
answers to different questions. Rarely is one program or one 
group of listeners studied from all of these viewpoints, but each 
study contributes to the sum total of knowledge of radio’s réle 
in our culture. Each contributes new techniques which can be 
used by the other. Altogether, they are gradually being inter- 
woven to form an increasingly important part in the general 
pattern of communications research. 

Some Examples of the Several Approaches to Radio Re- 
search.—The Office of Radio Research has had occasion to do 
research from all of these several standpoints. In studies re- 
ported in Radio and the Printed Page (11) radio was viewed 
from the standpoint of the educator; reading and listening 
habits were compared and new insights gained as to the rela- 
tive rdles of radio and print as informational media. It has 
studied a particular program, the Orson Welles “Invasion from 
Mars” broadcast, for example, from the standpoint of the psy- 
chologist to gain insight into how different kinds of people react 
to a given stimulus situation (3). This broadcast, it will be 
recalled, resulted in near panic in certain parts of the country. 
To determine how it came about that a radio drama could 
spread genuine terror through a substantial part of the popula- 
tion, detailed interviews were made with many different kinds 
of listeners who reacted in a variety of ways. The results not 
only shed light on the power and potentialities of radio as a 
medium of communication but enabled the psychologist to gain 
insight into the general psychology of panic. 

Again, certain programs and certain kinds of listeners have 
been studied from a broad sociological standpoint as in the not 
yet published study*® of the Kate Smith all-day bondselling 
appeal, wherein a particular broadcast of a popular entertainer 

“Swayed By Smith.” A chapter in The Social Psychology of Mass Persuasion. 


Robert K. Merton, with the assistance of Marjorie Fiske and Alberta Curtis. To be 
published by Harpers early in 1946, 
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was subjected to scrutiny. In this study the content of the 
program was analyzed to determine the variety of its appeals 
and the listeners were interviewed at length to determine the 
relative impact of these appeals, and how they came to decide 
to buy a bond from Kate that day. In the course of the analy- 
sis it became apparent that a particular radio entertainer may 
epitomize the social force of radio, reflecting certain trends and 
concepts in our culture, and at the same time reinforcing them. 
Kate, for example, lays great stress on the sacrifices and sacred- 
ness of motherhood. Indeed for many of her listeners she has 
come to typify motherhood: “Only a mother could plead the 
way she does,” even though most of them know she is not a 
mother. In this extolling of motherhood she not only reflects 
one of the basic concepts of our time and our culture, but at the 
same time she reinforces and strengthens it. 

The program sponsor and the advertising agency study 
radio from yet another angle. They are concerned with the 
extent to which their “messages” get across and with the extent 
of acceptance or rejection of their particular programs. Here 
the research focuses on the immediate response-reactions of the 
listeners. An effort is made to determine the extent to which 
a program is liked or disliked and why, and to find out its effect 
on the subsequent attitudes or actions of those who heard it. 
Studies of this kind involve either program research or the test- 
ing of commercials, techniques which we shall consider shortly. 

Viewing radio as a whole it is apparent that its content is 
the product of contemporary culture and customs. Its content 
reflects this culture because the people responsible for it are in 
turn the products of it. Analyses of the content of radio broad- 
casts, therefore, provide the sociologist with a better under- 
standing of society. On the other hand, this content has an 
impact on millions of radio listeners, and by studying the effects 
of radio on listeners’ habits and attitudes the sociologist also 
gains insight into the way in which it is changing, modifying or 
reinforcing the cultural pattern. 


The Techniques of Radio Research 


Surveys of General Listening Habits—The most general 
kind of listener survey involves a simple count of how many 
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people listen to a given program. Such counts, known as pro- 
gram ratings, are based on careful samplings of the population 
and are made systematically by a number of research organi- 
zations. Program ratings and fluctuations are thus made avail- 
able regularly to private clients.* The Office of Radio Research, 
however, while it does make use of such ratings in some of its 
more quantitative studies, e.g., “The Social Stratification of the 
Radio Audience” by Hugh Beville (2), confines itself largely 
to studies of a more detailed psychological nature. 

Surveys of general listening habits may be made for several 
reasons. One may wish to compare the influence of the various 
media of communication on a given area of behavior. How, for 
example, does the influence of radio compare with the influence 
of newspapers and magazines on voting behavior? (13). Or 
one may want to measure changes in listening habits resulting 
from program changes, or to gain insight into the réle of radio 
among certain groups of people. In this case one must also 
survey general listening habits over a period of time. Whatever 
the purpose of such general surveys, the procedure is the same: 
something akin to a “listening diary” must be procured from 
a representative sample of the population one wants to study. 

Several studies of this kind, both commercial and non-com- 
mercial, have been made at the Office of Radio Research. A 
good example of this is the question of listening in the daytime. 
If the whole listening pattern of women were taken into con- 
sideration they would fall into three equally large groups. Day- 
time Listeners include those who listen to serials and those who 
listen to other programs. Those who are at home in the morn- 
ing and could listen but do not comprise the Non-Listeners. 
Thus radio actually does not reach a third of the available 
morning audience and, at the same time, has to cater to two 
rather different kinds of audiences. How to reconcile the in- 
terests of these two divergent sectors of the audience is a prob- 
lem which leads to a large number of interesting and still partly 
unsolved research problems. 

Another survey of general listening habits was concerned 
with the question of who listens to small local stations. This 


4 See Chapters XI, XII, XIII, and XIV. 
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involved interviewing a cross section of the radio audience in a 
given locality about their radio listenership over a period of 
time, and comparing different age, sex and socio-economic 
groups in respect to station preferences (16). It developed 
that people on the lower socio-economic strata tend to listen to 
such stations more than do those who are better educated and 
better situated financially. 

A study of a somewhat different nature recently completed 
by the Office also falls into this category. Here the problem 
was to determine (a) the degree of satisfaction with current 
radio offerings, (b) attitudes toward commercial advertising on 
the radio, and (c) general receptivity toward a proposed new 
plan by which the listener would subscribe to a service which 
would provide him with three types of radio programs without 
any advertising. 

Still another kind of general listenership survey is designed 
to determine the réle of radio in the lives of particular groups— 
children, for example, or housewives, or certain socio-economic 
groups. , Such a study usually involves careful and detailed case 
histories of representatives of the group under study. This 
kind of investigation is well exemplified in two Office of Radio 
Research studies, “Listeners Appraise a College Station” (4), 
and “Radio Comes to the Farmer” (19). In the latter, it was 
possible, by use of the detailed interview method, to determine 
the extent to which the acquisition of a radio changed the habit 
and thought patterns of a group of Iowa farm households. 

A similar type of survey is indicated when a sponsor or a 
group of sponsors wants to measure the impact of his radio ap- 
peals or to compare it with appeals in other media (12). It was 
found possible in one study, for example, to gauge both the 
actual results of radio advertising and to get some idea of its 
potentialities (6). The investigators went into a representa- 
tive, moderate-sized community and interviewed several hun- 
dred housewives at great length about their listening habits, 
their awareness of retail merchandising over the air, their vary- 
ing degrees of receptiveness toward the various kinds of retail 
advertising and the extent to which such attitudes influence 
their buying habits. Among other things this study indicated 
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that there are certain kinds of programs which are better 
adapted to the selling of retail merchandise than others, and 
that certain kinds of merchandise lend themselves better than 
others to advertising over the air. 

In the course of such studies it has become clear that non- 
listeners are also an important factor in radio research, both 
from the standpoint of particular programs (17) and from the 
standpoint of non-listening in general. If we know why cer- 
tain people do not listen to a given type of program and how 
many people do not listen for these reasons, we can plan pro- 
gram changes which may not only improve content level but 
at the same time increase the total amount of listenership. 
Similarly, if an extensive survey were made of people who 
seldom or rarely listen to the radio at all, we could round out 
our picture of radio as a cultural expression and a cultural tool. 

The Nature of Program Research.—Such broad audience 
surveys as those outlined above cannot possibly encompass the 
more specific problems of listener likes and dislikes, listener 
gratifications or the extent to which listener attitudes are 
changed or modified by radio listening. Therefore, the investi- 
gator finds it increasingly necessary to study particular pro- 
grams or series of particular programs. In doing so, however, 
he comes face to face with research problems which are not 
especially peculiar to radio but which have their counterparts 
elsewhere in the communications field. How do you measure 
listener reaction to a program? What specifically does a lis- 
tener mean when he says he liked or disliked a certain program? 
How can one measure the “effectiveness” of a given informa- 
tional program? How determine the cumulative effect of a 
series of programs? 

There are three different ways of learning what a program 
means to people: by subjecting the program to a content analy- 
sis, by making a differential analysis of the personal character- 
istics of the groups that listen to the program, or by asking 
people directly what the program means to them. Wherever 
possible all three methods should be used simultaneously. 

Content analysis of radio material involves essentially the 
same techniques as are used in the analysis of printed materials, 
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and are usually based on scripts or transcripts of the broadcasts 

‘(unless one is studying all programs for the occurrence of a 
certain type of a content in which case one has to resort to 
“monitoring”). From analysis the investigator is able to list 
most of the affective factors of a broadcast. Thus the content 
analyst, after listening to a few instalments of a daytime serial 
script, may learn that it stresses an individualistic, competitive 
type of social relationship, that the surgeon hero wants to be 
a great man and stand high in a prestige hierarchy, that his 
interest is in himself and not in humanity. Or he may discover 
that the negro character depicted in the series is a servant whose 
chief characteristic is doglike devotion to his master with little 
or no portrayal of any individual thoughts, feelings or individu- 
ality of his own. In another study a content analysis of a Kate 
Smith script reveals that she sometimes uses the word America 
or American as many as seven times in five minutes, thus 
building up an associative complex which contributes to her 
reputation as a patriot. 

But, as we have already suggested, content analysis is im- 
portant largely as the first step in the study of any particular 
program or series of programs. In subjecting the script or 
scripts to a preliminary content analysis, the investigator accom- 
plishes two objectives. He is able to distribute his questions on 
the various components of the broadcast with some regard to 
their frequency and importance. Secondly, a thorough-going 
content analysis permits certain inferences about what the lis- 
teners may get out of the content, or at least will give the 
investigator some idea of what not to look for. Because it pro- 
vides both balance and perspective, content analysis is usually 
the first step in program research, whether it be an investigation 
of one program or a series of programs. 

The second way to find out what a program means to people 
is to determine what sex, age, and social groups listen to it. 
Much is known about the psychological differences among vari- 
ous strata of the population, and if the program is listened to 
by some of this group more than by others, the nature of its 
appeal can be more readily understood. If, for example, the 
audience of one of two comedians is more highly educated than 
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the audience of another, then it can safely be assumed that the 
first offers a more sophisticated kind of humor. The character- 
istics which are to be isolated will of course vary with the prob- 
lem at hand. In a study of the audience for a child guidance 
program, for example, whether or not the listener has children 
was a pertinent factor. It was found here that quite a number 
of childless women were found among the regular listeners, 
hence the conclusion that the practical advice offered is not the 
only appeal in this program. Some women, regretting their 
lack of children, might derive a vicarious satisfaction from 
hearing child problems discussed, while for others the broad- 
casts might have general educational value. 

If more general listening habit surveys included detailed 
information about the listener, such as reading habits, leisure 
time activities, community participation and so on, such mate- 
rial would become a useful tool for the further analysis of what 
certain kinds of programs mean to listeners. 

One of the major problems of program research is how to get 
the respondent to indicate what, in the program or series of pro- 
grams under study, is responsible for his reactions to it. It 
means little or nothing to the program planner, for instance, if 
he is told that 70% of the respondents studied liked a program 
very much, 20% liked it moderately well and 10% did not like 
it at all. It does not tell him what could be done to improve the 
attitudes of the other 30% or whether changing it to meet their 
taste would not at the same time antagonize the 70% who liked 
it in its original form. After considerable experimentation, how- 
ever, the Office of Radio Research has developed a technique 
which seems to contribute to the solution of this basic problem. 
This technique involves the adaptation of the polygraph fre- 
quently used in experimental psychology, and is known as the 
Lazarsfeld-Stanton Program Analyzer. 

The Program Analyzer is an apparatus which enables a 
group of respondents to record their reactions to a radio pro- 
gram, as they listen to it, by pressing red (“dislike”) and green 
(“like”) buttons, or by not pressing buttons, which signifies 
indifference to what is being heard. The push buttons are con- 
nected with a pen which moves along a roll of tape synchronized 
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with the radio program, thus making a permanent record of the 
reactions of the group (8, 15). 

Such a record alone, while interesting as a picture of the high 
and low points of the program as far as a given group of lis- 
teners is concerned, is comparatively meaningless from the 
standpoint of program improvement. The important thing for 
the program planner or the educator to know is what there was 
about a particular part of the program that the listener found 
dull or interesting and w/y in terms of the listener’s own back- 
ground and experience. The Program Analyzer Technique is 
therefore nearly always combined with a focused interview in 
which the trained investigator, using the Program Analyzer 
graph for reference, is able to determine just what it was about 
the program that caused the reactions indicated on it, and just 
what these reactions mean in the experience of the listener. 
Every research man who has tried to determine the “why” of 
reactions to a particular experience will recognize the advan- 
tage of this method. It gives him a picture of reactions which 
occurred simultaneously with the experience, and obviates a 
frequent difficulty in retrospective interviewing, to wit that the 
respondents often fail to remember how they felt in the earlier 
parts of the experience. Like many techniques developed in 
one field of communications, this one is useful in others as well 
and has been used successfully by the Bureau of Applied Social 
Research in testing reactions to motion pictures. 

The Program Analyzer, of course, can be used to study 
reactions to any kind of program. It has been found useful in 
determining the effectiveness of educational broadcasts, in 
analyzing the appeal of entertainment programs and measuring 
the impact of commercial announcements. The usual proce- 
dure is to interview 10 or 20 groups of people (10 to 15 in a 

5 The focused interview is a term applied to the technique of determining reac- 
tions to a particular communication or experience (a motion picture, a radio program, 
printed material and so on), known to the investigator, as distinguished from the 
more diffuse type of interview which is required when studying listening habits or 
attitudes which may be the result of several different experiences which are usually 

unknown to the investigator. The focused interview is a rather complicated pro- 
cedure, and the O.R.R. is now in the process of codifying the results of its experience 
with this ‘technique with various media of communication. The results of this sys- 
tematization have been summarized by Robert K. Merton and Patricia Kendall, in 


an article “The Focussed Interview” to appear in the American Journal of Sociology 
in the spring of 1946. 
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group), carefully selected to be representative of the audience 
the program is designed to reach. They first listen to the pro- 
gram, recording their reactions with the Program Analyzer 
push buttons, and are then questioned by a highly trained 
interviewer with all remarks recorded by a stenotypist. Their 
comments are then analyzed in conjunction with Program 
Analyzer graphs, and the investigator can thus determine what 
the effective components of the broadcast were and can make 
recommendations as to which parts of the program should be 
taken out, changed, or eliminated for more satisfactory results. 

When a radio investigator wants to probe deeper, to deter- 
mine the gratifications of certain segments of the radio audi- 
ence, interviews of a more elaborate kind are in order. Such 
studies usually involve two steps: detailed and exploratory case 
studies, followed by less detailed interviews with a larger sam- 
ple, for statistical verification of the hypotheses developed from 
the qualitative data. This combination of qualitative and 
quantitative research has two advantages. On the one hand, 
the first step enables the investigator to gain rich psychologi- 
cal insights which permit him to cover the wide range of possible 
responses in the statistical survey. On the other hand, the 
qualitative material enables him to understand, clarify and 
illustrate the quantitative data more fully. The combination 
of the two types of research has proven so fruitful that it has 
become an established procedure in many of the studies under- 
taken by the Office. Perhaps the best way to illustrate its value 
is by way of a concrete example. 

An Example of Program Research—tThe problem was to 
determine the gratifications of the millions of women who listen 
to the serial stories broadcast throughout the day by the major 
networks. As a first step, 100 women from various age and 
socio-economic groups were interviewed intensively (9). An- 
alysis of their reports about their listening experiences and the 
satisfactions they derive from it indicated that there are three 
major types of gratification in listening to daytime serials. 
Some listeners enjoyed them primarily as a kind of emotional 
release. Burdened with their own problems, they claimed it 
“made them feel better to know that other people have 
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troubles, too.” A second and more obvious form of enjoyment 


of the serials comes from the vicarious experiences they supply. 
A third gratification was entirely unanticipated by the investi- 
gator and constitutes a good illustration of the value of this 
kind of intensive interviewing. Many women listened to serials 
because they provide standards of value and judgment and help 
them to solve everyday problems. They learn things from 
these stories which they use later in solving their own problems: 
“Bess Johnson shows you how to handle children. She handles 
all ages. Most mothers slap their children. She deprives them 
of something. That is better. I use what she does with my 
own children.” Or they provide comfortable philosophy for 
use with one’s self or others: “When Clifford’s wife died in 
childbirth the advice Paul gave him I used for my nephew when 
his wife died.” 

In this way, by providing such “leads,” the intensive inter- 
views opened up the areas for investigation on a more quanti- 
tative basis. Later, when 2,500 listeners were interviewed (10), 
41% claimed to have been helped by daytime serials, thus 
giving statistical validity to a gratification which might have 
been overlooked altogether had the intensive interviews not 
been made. With the larger sample it became possible to make 
cross-tabulations which showed what kind of women found the 
serials helpful in this way. Thus, for example, it developed that 
the less formal education a woman has the more help she derives 
from the serials. The quantitative material also made it possi- 
ble to analyze the nature of this help, and it developed that 
listeners find these programs useful in several ways: getting 
along with people, helping people with their personal problems, 
learning how to handle themselves in particular situations, 
learning how to accept misfortune with a smile, and so on. 

Analysis of listener reaction combined with content analy- 
sis (1, 7, 14) of the scripts themselves then led to certain 
inferences about the rdle of such programs in our culture. It 
was found, for example, that these so-called true life stories do 
not deal with basic social or economic problems. They do not 
show a woman how she can improve her economic status nor 
do they give her a better understanding of the current problems 
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of our time—e.g., minority groups, etc. They tend, rather, to 
imbue the listeners with a fatalistic philosophy of life: this is 
how it is, we aren’t as badly off as we might be. They help the 
listener to accept her fate by universalizing it—e.g., “husbands 
never understand their wives.” Thirdly, they encourage the 
listener to live life through ready-made formulas for behavior 
rather than helping her to develop a critical sense which will 
enable her to determine what is good or bad for her in a particu- 
lar situation. 

The field of program research, however, has been by no 
means completely explored. Little, for example, is known 
about the maximum potential of radio from an educational and 
cultural standpoint. We know, to be sure, that by and large 
the programs that are known and promoted as “educational” 
reach a relatively small proportion of the radio audience, chiefly 
those who would make a point of acquiring the same informa- 
tion from another medium if it were not available to them over 
the air. It is known that such programs will not reach even 
these relatively few listeners unless organized efforts are made 
to build an audience (14). But what about the utilization of 
such already accepted programs as the daytime serials as a 
means of raising, rather than catering to, the cultural level of 
the average listener? The sponsor feels he would thereby lose 
some of his audience. But the fact remains that few have tried 
to improve them and there is as yet no proof that the sponsors 
are right or wrong. 

Effect Studies——Another more specialized form of radio re- 
search pertains to the effectiveness of one section or element of 
a program. The commercial sponsor may want to determine 
the effectiveness of his commercial announcement. He may 
want to compare the effectiveness of two or more different 
presentations. The program planner may want to determine 
the extent to which his program depends upon the popularity 
of any single feature in it. He may want to compare commen- 
tators or. announcers to determine which one is most accept- 
able to the greatest number of listeners. In all these cases the 
research procedure, as in most stimulus-response studies, 
involves holding all factors constant except the one under 
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study. Thus, to determine the relative appeal of two com- 
mercials, matched groups of respondents (or sometimes the 
same respondents) will listen to two broadcasts which are alike 
in all respects except the commercial. If there are no extrane- 
ous factors involved all differences in reaction to the two will 
be the result of differences in the appeal of the two commercials. 

If the sponsor does not have two or more specific com- 
mercials which he wants to compare, but wants, rather, to 
determine the effectiveness of a particular one, the problem is 
somewhat different. In the first place, he must decide whether 
he wants to measure the effectiveness of the commercial in 
terms of the number of sales of his product which it induces or 
is likely to induce, or whether he is concerned only with the 
extent to which the commercial is liked or disliked. (The rela- 
tionship between liking a commercial or any other kind of per- 
suasive appeal and being induced to act as a result of it is, 
incidentally, a problem which needs much further exploration. 
Studies done to date indicate that one may dislike a commercial 
intensely—“spot” announcements, for example, or singing com- 
mercials—and still be influenced by them.) If the investigator 
is primarily interested in sales effects rather than in what ele- 
ments of the commercial make it effective, a controlled check 
is commonly used. A section of the population is “exposed” 
to the appeal and sales figures for the product in that area are 
checked against those in a comparable area where the popula- 
tion was not so exposed. An alternative to this procedure in- 
volves interviewing buyers of the product to determine how 
they came to buy it. Most advertisers, however, seem to oper- 
ate on the theory that there is a connection between liking a 
commercial and buying the product which it extols. Conse- 
quently they are interested in research which will determine 
the degree of acceptance or rejection of the commercial an- 
nouncement itself. This problem involves quite different 
techniques. 

If the advertiser is concerned only with the interest aroused 
by the commercial in a given program context, the Program 
Analyzer technique is in order. The graph will show clearly 
the relative position of the commercial within the framework 
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of reactions to the program as a whole. From the focused inter- 
view which follows he can then learn much about what was 
liked or disliked about the commercial and what in terms of the 
listeners’ own experiences caused the favorable or unfavorable 
reaction. This technique is also useful in determining the effec- 
tiveness of commercials placed at various stages of the pro- 
gram—e.g., is the commercial placed at the beginning, end or 
middle of the program more effective? Should it follow a peak 
of interest in the program to capitalize on the high degree of 
attention at that point, or would such an approach cause a “let- 
down” on the part of the audience which might boomerang with 
resentment that “something is being put over” on the audience? 

Another technique for studying commercial announcements 
has been found especially useful in testing reactions to “touchy” 
subjects, very personal products or in testing institutional ad- 
vertising. This involves an intensive “depth” interview which 
is made immediately after the subject has read or heard the 
advertisement, and is equally useful for printed advertisements. 
Here the interview is customarily of an associative nature. 
What words or ideas are taboo? What words cause unpleasant 
associations which might in turn result in an unfavorable atti- 
tude toward the product or sponsor? This technique, inciden- 
tally, has suggested the interesting possibility that certain 
matters can be discussed in print which are not acceptable over 
the air and that contrariwise, some approaches are more effec- 
tive orally than in print. 

By and large, however, it has been found that the best way 
to make people articulate about commercial announcements, 
a subject which often leaves them lethargic at best, is to have 
them compare two or more. Most people are not sufficiently 
interested in such matters to become very talkative about their 
reactions, and the necessity of making a choice between two or 
more often provides the necessary impetus to self-examination 
as to why they selected one or the other. 

Another type of program research problem to which we have 
already alluded involves the prograni series. To investigate 
just one of a series of educational or entertainment or dramatic 
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programs would not be a valid test of the effectiveness of the 
series, for reactions to a given program may be partly predicated 
on remembrance of what went before and expectation of what 
istocome. Then, too, if the investigator is interested in changes 
of attitudes as a result of a program series he will get little from 
merely testing one program. The technique which has been 
developed to solve this problem is called the “panel,” which, 
reduced to its barest essentials, involves the selection of a group 
of people who agree to listen to a series of broadcasts and then 
report their reactions to the various programs. They may agree 
to come to a given place in a group and participate in a group 
interview using the Program Analyzer, or they may agree to 
record their reactions to various programs on formal question- 
naires (the latter, of course, is in order when a nationwide 
sampling is desired). Thus a virtually constant and identical 
group is made available for the examination of a series of pro- 
grams, and detailed comparisons of one program with another 
in a series can be made. Obviously, one can also procure other 
information from such a group—reading habits, program prefer- 
ences, movie attendance, and so on, which is helpful in evalu- 
ating variations in listener reactions. 

Special Characteristics of Radio—There are at least six 
characteristics of radio which distinguish it from other media. 
Researchers in the Office have had to take these into considera- 
tion when studying its effectiveness from any particular stand- 
point. Each of these qualities has both positive and negative 
aspects from the standpoint of effective communication. 

Perhaps most significant of these characteristics is radio’s 
accessibility. Nearly every person in the United States has 
access to a radio. There are few geographic or economic bar- 
riers to its use once the initial investment has been made. Ina 
sense, then, radio is more readily available than the other mass 
media, for each magazine, newspaper and motion picture must 
be purchased separately. But in another sense radio is less 
accessible than these other media. Once one has bought a 
newspaper or magazine one can keep it. It may be read at any 
time and an interruption or a lack of comprehension of a pas- 
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Sage are not serious matters, for it can always be re-read. But 
a radio program is as ephemeral as time itself. If the telephone 
or doorbell rings just when John Kieran is about to answer a 
baseball question or just when the mystery is about to be solved 
one cannot set back the needle to pick up what has been lost. 
Motion pictures are also ephemeral in this sense, but the cir- 
cumstances under which they are seen tend to offset this factor: 
one is not likely to be interrupted in a theater, and if a movie- 
goer so desires, he can always sit through a second showing. 

Another special characteristic of radio is that it relies on 
auditory perception. The human voice is a more personal, 
direct and potentially more stimulating means of communica- 
tion than the printed word, but this does not necessarily mean 
that radio is a more effective means of transmitting all kinds of 
communication to all kinds of persons. Studies done by the 
O.R.R., for example, indicate that people in the higher social 
and economic brackets more often prefer to read factual infor- 
mation rather than to hear it. 

A third characteristic of radio is in part the outgrowth of 
the first two. Its accessibility, combined with its reliance on 
auditory perception, enables people to listen while carrying on 
a variety of other activities which do not necessarily interfere 
with their perception. But at the same time this quality of 
non-interference leaves the radio program liable to a low degree 
of attention. The listener may become so conditioned to it that 
he no longer hears it with any degree of acuity. This poses a 
problem for the investigator and necessitates a thorough prob- 
ing of the seemingly factual statement: “Yes, I listened to that 
program” to determine the degree of concentration concealed 
behind it.* 

A fourth characteristic of radio is that it continues im time. 
This means that a series of programs may become part of the 
daily or weekly habit patterns of the listeners, that cumulative 
effects can be built up over long or short periods. But it also 


The problem of developing techniques to gauge degree of attention to radio 
programs is becoming a matter of great concern to television producers who want to 
know not only whether and how well a program was heard, but also whether and 
with what degree of attention it was seen. 
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means that it is liable to surfeit. It may be true that “if you 
hear a thing often enough you will come to believe it,” but it is 
probably equally true that if you hear a thing too often you 
may not pay any attention to it at all after a time. Just where 
repetition ceases to be effective, just when saturation points are 
reached, is still a problem which has to be faced anew for each 
kind of program or message. 

There are two other characteristics of radio which develop 
from its accessibility. A national network may reach into 
homes all over the country but if it does so it must confine its 
appeal to a general one. This national quality prevents it from 
appealing directly to local interests and experiences. Theoreti- 
cally, of course, the potential audience is great enough for a 
nation-wide program to be beamed at special groups such as 
fishermen, students or stamp collectors and still reach a sizeable 
number of people. But since the aim of the networks is to reach 
as many people as possible at a given time, their specialized 
appeals are confined to large groups such as farmers or house- 
wives who are known to constitute the majority of listeners at 
certain times of the day. Appeals to smaller groups are left for 
the local stations which cannot hope to compete with the net- 
works on their own ground. The coming of frequency modula- 
tion might bring many changes in this respect. 

The discussion of the nature of the problems met in the field 
of research and some of the techniques developed to meet them 
may be sufficient to bring the reader to two conclusions already 
evident to social scientists concerned with radio. First, there 
is a growing awareness of the necessity of systematizing the 
knowledge and experience accumulating in this field: a convic- 
tion that such self-conscious rigorization of procedures will be 
of value not only in the field of radio research alone, but to the 
science of communications in general (and perhaps even to 
other fields of social research). Secondly, as this formulation 
and formalization of procedures and problems proceeds, the 
sociologist and psychologist working in radio research become 
increasingly humble about what they do not know. But even 
at this comparatively early stage of their development, radio 
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research activities have already stimulated magazine and news- 
paper publishers to do more research than ever before, and it is 
possible that in the not too distant future not only will tech- 
niques and problems be exchanged between these two fields, but 
funds and research institutions may be merged for the greater 


benefit of both. 
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THE DUTIES OF CIVIL SERVICE EXAMINERS AND 
TEST TECHNICIANS 


Tue following check list is one of a series being prepared 
under the auspices of the Society for Public Administration to 
depict what public personnel workers do. It is a modification 
of a list originally prepared by Dr. John M. Pfiffner for the 
Committee on Education of the Society. It is presented here 
in the hope that readers of EpucATIONAL AND PsyCHOLOGICAL 
MEASUREMENT will offer their constructive criticisms before the 
list is included in the final published report. Comments will 
be welcomed by the chairman of the Committee, Mr. Edgar W. 
Lancaster, Office of the Secretary of War, Room 4E 978, Penta- 
gon, Washington 25, D. C. 

It should be noted that all of the duties listed below are not 
ordinarily performed by any one person. The list is intended 
to cover the work which may be done by all workers engaged — 
in merit system examining. A list of the main areas of activity 
is given first, followed by the more detailed outline. 


Main Topics 


I. Plan examinations.’ 
II. Construct tests. 
III. Supervise the scoring of tests. 
IV. Evaluate training and experience. 
V. Supervise the administration of tests. 
VI. Supervise the conduct and scoring of competitive oral 
interviews. 
VII. Establish registers of eligibles. 
VIII. Serve as consultant for the service rating program. 
1 The term “examination” is used in a broad sense to indicate the entire pro- 
cedure by which a person’s qualifications are evaluated with respect to a position 
or series of positions. An examination may include a test or tests, an oral interview, 


or an appraisal of training and experience, either singly or in any combination. 
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IX. Participate in establishing classification specifica- 
tions. 
X. Conduct research on examinations. 


Detailed Outline 


I. Plan examinations. 

A. Assemble data concerning the duties of the positions 

for which examinations are to be conducted. 

1. Consult class specifications. 

2. Secure additional job descriptions from operating 
officials. 

3. Search for additional job standards. 

4. Interview supervisors, foremen, and workers. 

5. Observe actual work being done and do some of 
the work, if practicable. 

6. Read laws, regulations, directives, manuals and 
books which have a bearing on the job. 

7. Prepare summary showing 
a) Duties. 
b) Knowledge necessary. 
c) Skills used. 
d) Personal characteristics required. 

B. In the light of information obtained in Step I-A, 
determine the minimum requirements for taking the 
examination, if any, within the limits set by legis- 
lation. 

C. Outline the nature of the examination considered 
to be most suitable for selecting qualified workers. 
Within the limits set by legislation, and usually after 
consultation with experts in the field, determine 
whether it shall include an evaluation of training 
and éxperience, a competitive oral interview, and 
written or performance tests, giving due considera- 
tion to any pertinent experimental data which may 
be available. 

D. Determine weights to be assigned the major parts 
of the examination, taking all pertinent information 

into account. 
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E. Prepare the copy for an announcement of the exami- 
nation for the printers, or pass on the requisite infor- 
mation to those in the organization charged with the 
responsibility of issuing announcements. Include 
information on the following topics: 

Title of position. 

Grade and salary of position. 

Location of employment. 

Hours of work. 

Detailed explanation of education and experi- 

ence requirements. 

General information. 

a) Whether written tests, performance tests, 
and competitive oral interviews will be given. 

b) Provisions of regulations regarding qualifica- 
tions statements. 

c) General eligibility requirements in regard to 
citizenship, veterans’ preference, etc. 

d) Information as to how and where application 
should be made. 

e) Weights to be given the various parts of the 
examination. 

II. Construct tests according to plan. 

A. Construct written tests. 

1. Select promising existing test items from file, 
noting statistical history resulting from item 
analysis. 

2. Edit old test items to suit current need. 

3. Write new test items, observing the best psycho- 
logical and psychometric techniques and proce- 
dures. In the case of achievement tests: 

a) Read widely in subject-matter field. 
b) Collect background material. 
c) Confer with other personnel technicians. 
d) Ask subject matter experts to submit test 
material. 
e) Train subject matter experts in test con- 
struction. 


Pee 


a 








374 


ITI. 


IV. 





EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT 


B. 


4. Check on the final content of the test. 

a) Check items for possible defects such as 
faulty phrasing or the presence of specific 
determiners. 

b) See that items are of appropriate difficulty. 

c) In the case of achievement tests, make sure 
that concepts within the required areas are 
adequately sampled and that each area is 
appropriately represented. 

d) Check to see that repetition of concepts is 
avoided and that the content of one question 
does not help in answering other questions. 

e) Check achievement test items with operating 
officials and subject matter experts. 

5. Assemble the test in final form with written di- 
rections, instructions for administration, answer 
keys, and directions for scoring. 

Develop performance tests, when called for, on the 

basis of the study made in step I-A, giving due con- 

sideration to ways of measuring the process of per- 
formance as well as the final product. 


Supervise the administration of tests. 


A. 
B. 
C: 


D. 
E. 


Arrange for the use of suitable room or rooms. 
Arrange for enough qualified proctors. 

Arrange to have tests, pencils and necessary appa- 
ratus ready, with precaution taken to prevent tests 
from being inspected before they are given. 

Train proctors. 

Administer tests. 


Supervise the scoring of tests. 


A. 


B. 


Supervise the preparation of scoring keys for objec- 
tive-type questions. 

Plan the scoring procedure for questions which are 
not objective, taking steps to insure reliability and 
uniformity of scoring. 

Develop the scoring procedure for performance tests, 
if used. 

Train scorers. 
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E. 
F. 
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Determine weights to be given parts of the test, 
taking all pertinent information into account. 
Obtain total scores in accordance with the plans_ 
developed for the examination. 

Transmute total scores to a basis appropriate for 
combining with any other measures of qualifications 
which may be used, with due consideration for any 
legal requirements which may exist. 


Evaluate training and experience. 


A. 


Determine the amount of credit, if any, to be given 
various types of experience, obtaining the advice of 
experts in the field and making use of validation 
studies, where possible. 

Determine the amount of credit, if any, to be given 
various types of training on the basis of the advice 
of experts in the field and the conduct of validation 
studies, where possible. 

Develop a schedule to be used in evaluating training 
and experience in accordance-with the credit system 
developed. 

When the rating is not done by a rating technician 
or subject matter expert, train clerks in the use of 
the form for evaluating training and experience and 
supervise them in applying the procedure. 

Decide special questions which arise in connection 
with evaluating training and experience. 
Transmute raw scores on training and experience to 
a base appropriate for combination with other mea- 
sures, with due consideration for any legal specifica- 
tions which may exist. 


Plan and supervise the conduct and scoring of competi- 
tive oral interviews when such interviews are employed 
by the merit system. 


A. 


OW 


Determine characteristics to be measured by the 
oral interview as distinguished from those covered 
in other parts of the examination. 

Develop instructions for interviewers. 

Develop rating scales for factors to be observed in 
the oral examination. 




































VIII. 


IX. 


VII. 
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D. 


E. 


Outline desirable qualifications for members of inter- 

viewing boards. 

Train members of interviewing boards. 

Develop and apply a method of obtaining scores 

based on competitive oral interviews. 

1. Obtain a score for each interviewer’s report. 

2. Combine the scores from all interviewers for each 
applicant, taking necessary steps to insure that 
ratings are given equal weight. 

3. Transmute the scores to a standard scale appro- 
priate for combining with other measures, with 
due consideration for legal requirements. 


Establish registers of eligibles. 


A. 


D. 


E. 


If weights were not previously established, deter- 
mine the weights for the component parts of the 
examination, taking all pertinent information into 
account. 

Combine component parts of the examination in 
order to give established weights to the various 
parts. 

Set a passing point and transmute the scores to the 
standard grading system in use. 

Adjust final scores for special preference groups, if 
any. 

Supervise the preparation of the register with the 
eligibles placed in the order of final score. 


Serve as consultant for service rating program. 


A. 
B. 
Gc. 


Assist in developing the rating schedules to be used 
and the method of scoring them. 

Participate in the analysis of the results from service 
ratings. 

Assist in handling problems arising from the use of 
service ratings. 


Participate in establishing classification specifications. 


A. 


B. 


Conduct studies on the relation of minimum require- 
ments of education and experience to effectiveness of 
job performance. 

Consult with classification analysts on the analysis 
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of required knowledges, skills, and abilities in mean- 
ingful, measurable psychological terms. 

On occasion, apply measurement techniques to the 
evaluation of factors determining the allocation of 
positions to specific grades. 


X. Conduct research on examinations. 


A. 


B. 


C. 


D. 


Conduct validation studies, preferably before tests 
are used, if the need for maintaining the confidential 
nature of the tests allows. 

1. Give to a group or groups of persons comparable 
to those to whom the test will be applied. 

2. Determine criteria against which the tests should 
be validated and develop reliable measures of the 
criteria, when possible. 

3. Obtain measures of the relation between the 
tests and the criteria. Make a study of the 
validity of the test items. 

4. Pevise the test or scoring procedure in the light 
of studies of validity. 

5. Continue validation studies through investigat- 
ing the relation of test scores to subsequent per- 
formance on the job, with a view to discovering 
principles to be used in constructing valid tests 
for similar positions in the future. 

Check on the reliability of tests and conduct studies 

leading to the construction of tests of adequate relia- 

bility. 

1. Obtain estimates of reliability of individual tests. 

2. Conduct item analyses to determine internal con- 
sistency of sets of items. 

3. Conduct studies designed to help in eliminating 
items which tend to lower the reliability of tests. 

Make item analyses of tests used and set up a sys- 

tem for maintaining records of the performance of 

items for various groups and examinations. 

Conduct research on miscellaneous measurement 

problems. 














THE ARRANGEMENT OF CHOICES IN MULTIPLE 
CHOICE QUESTIONS AND A SCHEME FOR 
RANDOMIZING CHOICES 


CHARLES I. MOSIER anp HELEN G. PRICE 
State Technical Advisory Service, Social Security Board 


In the construction of objective test items in multiple-choice 
and allied forms, the arrangement in order of correct answer 
and choices is often a troublesome chore. Some test writers 
prefer certain positions for the correct choice, finding that items 
having the correct answer in a position other than as the first 
or last choice increases the difficulty of the item;' most, how- 
ever, resort to one device or another to secure a systematic, 
truly random arrangement. When the location of the correct 
choice in the series of choices is left to the whim of the test 
constructor, personal position-preferences will almost inevita- 
bly result in a preponderance of correct answers falling in one 
position. Moreover, since distracters tend to be written in 
order of plausibility, with the last distracter often written as a 
desperate final effort, a randomization process should extend 
beyond the correct choice to the incorrect ones as well. The 
present paper presents a “randomizer” for five-choice items, a 
discussion of its use, and a simple method by which other simi- 
lar aids may be constructed. Some of the situations in which 
it should not be used are also considered. 

In writing multiple-choice items, we at the State Technical 
Advisory Service follow certain mechanics designed to simplify 
the writing process and to provide needed controls on quality. 
After the premise or the question has been formulated, the in- 
tended answer is always written first, with the incorrect choices 
following. This practice of having the intended answer written 


1 McNamara, W. J. and Weitzman, E. “The Effect of Choice Placement on the 
Difficulty of Multiple Choice Questions.” Journal of Educational Psychology, 
XXXVI (1945), 103-113. 
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as the first choice has a number of advantages. It insures that 
a correct choice is included and that the item writer, in his zeal 
to prepare plausible distracters, does not end with five plausible 
choices—all wrong. (Such things have happened.) More- 
over, in case of any later doubt that the answer indicated on the 
scoring key may not be the one intended, a quick check against 
the original draft will remove any question about the writer’s 
intent. Editorial review and checking are also facilitated. 

After the items are reviewed for authenticity and edited for 
grammatical construction and technical form, the alternative 
answers are assigned their final, random order by use of the 
table attached. The table, constructed for five-choice ques- 
tions, shows the 120 permutations of the numbers one through 
five. In preparing the table the permutations were written in 
systematic, cyclic order and each permutation was assigned a 
sequence number from one through 120. Each permutation 
was then assigned, as its final position in the table, the order in 
which its sequence number occurred among the last three digits 
of a nine-place table of logarithms. 

The use of this table has these advantages over other sys- 
tems of randomization: (1) only the numbers actually used 
need be considered; (2) there are no repetitions or omissions 
of choice numbers for any item; and (3) the order of all five 
choices is given simultaneously. In applying the table, the item 
writer assigns to each successive item the choice patterns in the 
order in which they occur, beginning each new group of items 
where the previous one left off; every choice pattern is thus used 
once before any pattern is repeated. 

The present table can be used for three- and four-choice 
items as well. Such use, however, loses the principal advan- 
tages enumerated above and it would seem far simpler to use 
the same procedures in constructing similar “randomizers” for 
those types of items. 

For certain types of items the choices should not be random- 
ized. When the choices represent selections from a meaningful, 
ordered series, e.g., dates or magnitudes, it is far less confusing 
to the candidate if they are arranged in their natural order. 
Even where the order of choices is fixed, e.g., by a series of 
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dates, the randomizing table can be used to advantage in locat- 
ing the correct choice objectively, thus determining the number 
of dates the item writer can select which should precede and 
follow the correct one. By using the table the item writer can 
select the choice pattern, assign to the correct choice the first 
number in the pattern, and distribute the distracters around 
the choice. Thus, if the choice pattern were 2 43 5 1 and the 
question were: 

“The year in which the Pilgrims landed at Plymouth Rock 
is”: the intended answer would be choice 2 and the completed 
arrangement of choices might be: 

(1) 1607; (2) 1620; (3) 1628; (4) 1636; (5) 1776. 

A predetermined choice pattern should be used, of course, 
only when the incorrect choices are selected because of their 
association with the question asked; it is more important to 
have effective distracters than to follow a prescribed order. 

Another situation in which the choices should not follow a 
randomized pattern is that in which the choices include any of 
the numbers one to five as answers. In these items, the number 
of the correct choice should be the same as the choice itself; 
thus: 

“The reciprocal of .25 is: (1) one (2) two (3) three (4) 
four (5) five.” 

It is sufficiently confusing to the candidate to have to 
answer the question without having to remember, as he might 
if the choices were randomized, “The answer to the question is 
four but ‘four’ is choice number 3 and it is not the answer, four, 
but the number of the answer, 3, which I must mark in the 
answer booklet.” The problem posed by this type of answer 
can, of course, be met by shifting from the designation of choices 
by number to the designation by letter. The use of letters, 
however, has disadvantages and the other solution seems 
preferable. 

There is another situation in which it seems desirable to 
modify the random order of choices. The discussion of this 
situation is presented here as an hypothesis partially borne out 
by observation rather than as one verified by evidence. When 
the choices presented include a best answer and another which, 
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although not the best, is very nearly as good, the writers have 
observed that the item will very frequently have a negative 
item-test coefficient if the nearly-as-good choice precedes the 
best answer; the coefficient is far more likely to be positive if 
the order of these two choices is reversed. Apparently the high- 
scoring candidates read until they come to the not-quite-so- 
good choice, recognize it as an acceptable answer, give it, and 
turn immediately to the next question. The lower-scoring stu- 
dents, unable to find an answer which their knowledge will per- 
mit them to identify as correct by recognition, make a careful 
comparison among the alternatives and have a greater proba- 
bility of success than the high-scoring group. Reversing the 
order of the two choices makes the item easier, but tends to 
correct its negative relation with total score. Whether such a 
change will have the desired effect of increasing the discrimina- 
tory power of a particular item must, however, be weighed very 
carefully in the light of the choices in question, the function to 
be served by the item, and the group of candidates for which 
it is intended. 


NUMBERS FOR RANDOMIZING 5-CHOICE ITEMS 


35124 13425 12453 24531 54231 42153 
34215 43512 54132 32541 32154 25143 
12354 43251 15324 52314 21453 24135 
25134 51243 25314 42315 54321 52143 
14523 13254 15423 51342 12543 34152 
24513 43152 54312 42531 23451 52134 
52431 31245 13524 51234 21534 15342 
31452 42513 12534 14532 35412 23514 
54213 25341 43521 41325 53421 34251 
53412 34521 21435 32514 51432 32145 
43125 13542 35241 41352 52341 51324 
23541 13245 45132 31542 54123 43215 
45231 41235 24351 31425 23415 41523 
21354 14235 23145 52413 35214 24153 
15243 53124 21543 31524 25431 41532 
42135 21345 25413 12435 35142 34125 
31254 45213 32415 12345 51423 23154 
14253 45321 53214 35421 14352 24315 
53142 13452 42351 45123 32451 45312 
15432 41253 14325 53241 15234 34512 
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THE RATIONALE OF TEMPERAMENT TESTING 
DONCASTER G. HUMM 


Personnel Service, Los Angeles, California 
Temperament 


TEMPERAMENT, according to the dictionary, has to do with 
internal constitution. It also has to do with the peculiar physi- 
cal and mental characteristics that influence an individual’s 
disposition, or the character of mind or mental reactions having 
to do with his behavior. Hence temperament may be consid- 
ered as the pattern or complex of tendencies which determines 
an individual’s behavior. As such, it is made up of traits or 
tendencies to respond in a consistent manner whenever a given 
type of situation arises. Each individual has an abundance of 
traits arising out of the interaction of his original nature and 
his environment; some are chiefly the result of hereditary 
forces; others, of non-hereditary forces; and still others, of 
mixed forces. Some inhibit the effect of others while some tend 
to reinforce others. 

Temperament may be analyzed in many different fashions 
depending upon the psychological attack of the author. Thus, 
Freud analyzes it into the following three tendencies to react: 
(1) toward self-preservation, (2) toward race-preservation— 
or sex, (3) toward gregariousness. Jung, with his emphasis 
upon attitudes, divides temperament into two great types: the 
introverts and extroverts. This oversimplification, however, 
should not be accepted as Jung’s final analysis, since he sub- 
divides introverts and extroverts into several different sub- 
classes and also considers the ambivert, the individual who has 
characteristics that are both extroverted and introverted. 

One of the most useful analyses of temperament is Rosan- 
off’s (1). This analysis has the merit of reflecting the practical 
experience of many prominent students of personality including 
$88 
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Leri, Birnbaum, Kraepelin, Spratling, Davenport, Dostoyevski, 
and Flaubert. Rosanoff’s function has been that of editor plus 
that of integrator plus that of contributor. He has taken the 
practical experience of these men, added his concept of the con- 
trol and directive component of temperament, and integrated 
all of this psychiatric experience into a comprehensive analysis 
of the field. It should be noted that this analysis is chiefly the 
result of the experience, observations, and research of psychia- 
trists dating back as far as the 1870’s. We used it as the basis 
of the Humm-Wadsworth Temperament Scale because it had 
already demonstrated its value to us by explaining problems 
of behavior in clinical and industrial use. 


The Characteristics of a Good Temperament Test 


The first characteristic of a good temperament test is that 
it is based upon a comprehensive and valid analysis of tempera- 
ment. As such, a temperament test may be based on any of 
the three analyses we have mentioned or on any analysis which 
is sufficiently comprehensive to cover the reaction pattern of an 
individual. 

The second characteristic of a good temperament test is 
adequate standardization. Adequate standardization starts 
with a good sampling of the temperamental traits which must 
be explored in a comprehensive analysis. These samples ordi- 
narily consist of questions or items which may have a variety 
of forms. They may be multiple-choice, false-or-true, or any 
of the types of items which will bring to light the basic traits 
possessed by the test subjects. Each of these test items must 
be subjected to a careful item analysis to determine whether or 
not it actually elicits the information desired. 

It is important in constructing a temperament test to take 
account of the purpose for which the test is to be made since 
the attitude with which the individual responds to the test will 
influence his answers. If the test is to be used for clinical pur- 
poses, the control subjects on whom the test is standardized 
should have the atmosphere of the clinic in which to respond 
to the items or answer the questions, If the test is to be an 
industrial test, the atmosphere which pervades the testing of 
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applicants or workers for industry must also be present when 
the control subjects respond to the questions. However, it has 
been possible in our experience to develop measures by which 
compensations for subjects’ attitudes can be made so as to in- 
crease the usefulness of the test. 

Thus, in the standardization of the Humm-Wadsworth 
Temperament Scale (2) two measures of the subject’s attitude 
were studied: the No Count and the Profile Count. These 
revealed a tendency on the part of the subject to report his 
temperamental tendencies in an atypical manner or to over- 
report them or to underreport them. In the Manual of Direc- 
tions (3) which accompanies the Human-Wadsworth Tempera- 
ment Scale, statistical compensations are reported to make it 
possible to consider overreported and underreported Scales as 
though they had been typically reported. This subject is 
further considered in the Manual of Interpretation (4). There 
has also been provided a Nomograph (5) to make it possible 
to make these compensations more easily. A simple explana- 
tion of the compensations is also included in Personnel Evalu- 
ation Method (6). 

After the items (or questions) of the test have been evalu- 
ated the next step is the construction of norms. In this regard, 
temperament tests are very different from other types of tests 
such as interest inventories, intelligence tests, skill tests, and 
the like. It is very difficult, if not impossible, to construct a 
temperament test with only a single set of norms. This arises 
out of the peculiar nature of temperament. 

In intelligence-test construction, the objective is to find a 
measure by which the subject may be compared with the whole 
population, so the procedure is to standardize the projected test 
on a control group which represents as nearly as possible a 
cross-section of the population. In temperament-test construc- 
tion many of the tendencies we are trying to measure are very 
dificult, and perhaps impossible, to identify in average well- 
controlled individuals by any means available before the test is 
made. Moreover, some of these tendencies occur with less fre- 
quency than do others; in any survey of a cross-section of the 
population, they appear as statistically unimportant, but in the 
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individuals in whom they are strong they have the utmost 
importance. 

In standardizing the Humm-Wadsworth Temperament 
Scale we hit upon the device of using control subjects, selected 
by case study, who represented extreme examples of the ten- 
dencies we wished to measure either by possessing the tenden- 
cies to a high degree or by not possessing them at all. The 
following tabulation will illustrate our use of such control 
groups. 


Control Groups Used in Standardizing the Humm- 
Wadsworth Temperament Scale 


Components Plus Groups Minus Groups 
“Normal” Strongly “Normal” Subjects State Hospital Patients 
Hysteroid Habitual Criminals Self-Sacrificing Persons 
Manic Excitable, Emotional Subjects Subjects lacking Manic Traits 
Depressive Strongly Depressive Subjects neeer lacking Depressive 

raits 
Autistic Shy, Seclusive Subjects Subjects lacking Autistic 
raits 
Paranoid Aggressive, Opinionated Subjects — lacking Paranoid 
raits 
Epileptoid Subjects given to Epileptoid Subjects lacking Epileptoid 
Tendencies Traits 


The Scale as developed in this way then gave us a descrip- 
tion of the individual’s disposition, a measure of his mental 
health, and a comparison of his tendencies to react with those 
of other typical groups. Thus, a temperament test describes 
the disposition of the subject, estimates his powers of self- 
mastery and self-control, and compares his reaction pattern 
with the reaction pattern of other subjects of known character- 
istics. Having provided for such a picture of test subjects by 
comparison of their scores with those of specially selected sub- 
jects we proceeded to learn the meaning of our scores in terms 
of the average, that is the general, population. This was accom- 
plished by giving the test to a large group of adults (all of the 
employees, from president to unskilled laborers, of a company 
which had not previously used tests). This group is probably 
not a perfect cross-section of the whole population but we have 
good reason to believe that it is a satisfactory sampling. 

The distribution of scores afforded by this survey gave us 
information as to the average strength in well-adjusted adults 
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of the tendencies measured. We found, for example, that ten- 
dencies to be sociable, cheerful, active, and emotionally respon- 
sive to the environment are relatively common, while tendencies 
to be conceited, suspicious of the motives of others and stub- 
bornly fixed in one’s opinions, are fairly rare. 

We are frequently asked by personnel men and by students 
why we cannot provide for an overall score which might be 
taken as a general measure of good or poor temperament. We 
do not do this because an important consideration in the use 
of temperament tests is the identification of patterns of be- 
havior tendency. Thus, such a test really is a battery of tests 
rather than a simple measure. The interrelationships among 
the various measures included in the battery are quite certainly 
more important than the strength of the individual components 
of temperament considered separately. This consideration of 
temperamental patterns or syndromes enormously complicates 
both the construction and interpretation of temperament tests. 
As a result, the problem of making a temperament test becomes 
an expensive and time-consuming project, and the problem of 
interpreting the completed test is one which requires the acqui- 
sition of special skills. I suspect that all types of tests would 
gain in usefulness if we would pay more attention to the spe- 
cialized problems of interpretation each type presents. 

I have mentioned the problem arising out of the attitude 
with which the subject approaches the test situation. In intel- 
ligence testing and skill testing it may be assumed that most 
subjects will do the best they can—except, perhaps, for such 
situations as those in which a criminal might feign feeble- 
mindedness or a soldier try to conceal a skill which would lead 
to an undesired assignment. 

In temperament testing, however, all sorts of complexities 
affect the subject’s responses. There are, of course, no “right” 
answers or “wrong” answers. Each answer will be true for some 
subjects and untrue for others. It is often supposed that the 
expected or favorable answer would be easily recognized and 
would be selected by all or most applicants for jobs, but this 
does not happen. Some subjects seem to be more suggestible 
than others, either positively or negatively; some seem to lean 
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over backwards to claim unfavorable traits; others seem to 
deny the possession of even desirable characteristics. A suc- 
cessful temperament scale must include in its scoring and inter- 
preting procedures means of taking account of these tendencies. 
We have found that certain relationships among the scores 
reveal the effect of these attitudes, and compensation for such 
attitudes can be made. 


Use of Temperament Tests 


As noted previously, a good temperament test may be used 
for a prediction of behavior since it will reveal the status of the 
subject’s mental health, it will describe his disposition, and it 
will compare his behavior tendencies with others. However, a 
temperament test cannot be used as a prediction of behavior 
unless the situation in which this behavior is to occur is care- 
fully taken into account and unless the other factors of person- 
ality, aside from temperament, are also taken into account. 
This follows from the fact that a temperament test reports ten- 
dencies—tendencies which are operative only in the presence of 
trigger situations. Thus, a temperament test should be so con- 
stituted as to report the probable behavior of an individual 
when he is free from undue strains and also report his probable 
behavior when he overcompensates for strains. 

All this makes the estimate of the situation and the estimate 
of other factors influencing behavior, as summarized in the esti- 
mate of probable strains, an important consideration in the use 
of temperament tests. 

The situation in which an individual is placed may or may 
not be of such a nature as to be conducive to a tranquil, accep- 
table adjustment of the individual. If it may be taken for 
granted that the individual is in a compatible, sympathetic, and 
kindly atmosphere, it may be taken for granted that strain in 
such a situation will be reduced to a minimum. If, however, 
the situation has anything in it which is likely to put the indi- 
vidual on the defensive or likely to give rise to contention or 
other forms of unpleasantness, it can be predicted that the indi- 
vidual will undergo strain. Thus, it follows that the findings 
of a temperament test alone are not sufficient to predict how 
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an individual will respond in any given special situation. For 
example, it is not possible to predict from the findings of a tem- 
perament test how well a student will adjust to college unless 
it is also possible to predict how well the student will like the 
college atmosphere, how suitable his course will be for his apti- 
tudes and interests, and how well he will be received. Similarly, 
it is not possible to predict how well a worker will get along with 
a group of workers unless it is possible to predict how well he 
will like the group, including his boss, how well the group will 
like him and get along with him, and how well the job will fit 
him. 

There are several factors in the constitution of the indi- 
vidual, aside from his temperament, that have an influence on 
his behavior. Some of these are in the field of aptitude. For 
example, if an individual is placed in a business situation where 
his intelligence is not adequate, one must expect an undue strain 
to result. If he is placed where his intelligence is so superior to 
the job that it is very incompletely utilized, one must expect 
another sort of strain—that of boredom. This reaction is also 
to be expected with reference to skill. A highly skilled worker 
placed in a job which makes demands for mediocre skill is likely 
to become dissatisfied and get into mischief. A worker who is 
placed in a position which requires more skill than he possesses 
is likely to become discouraged or defensive or in some other 
way to compensate for his feelings of inadequacy. 

Health is also an important consideration in estimating 
strain. For example, a man of super-abundant energy with 
considerable pressure of activity cannot be tied down to an 
inactive job without the expectation of some over-compensation 
on his part. Likewise, an individual who is struggling in a job 
beyond his strength is likely to suffer, not only with respect to 
his physical health, but also with respect to his mental health. 

It follows that the prediction of behavior can be accom- 
plished by the use of a temperament test if the findings of that 
test are supplemented by findings of other sorts—probably in- 
cluding non-test data as well as test data—to predict the 
amount of strain the individual may be expected to endure in 
the situation or situations under consideration. 
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Temperament Testing in Clinical Practice 


Valid temperament tests are useful in studies of individuals 
for vocational guidance, educational guidance, and problems of 
social and marital adjustment. In many instances a good tem- 
perament test will indicate whether or not a personal problem 
will be complicated by a poor state of mental health or by an 
insufficiency of self-control or self-mastery to direct dynamic 
temperamental qualities. It should be the practice of a psy- 
chologist, however, to refer problems of mental health to psy- 
chiatrists for examination. Whenever there is any question of 
psychosis, psychoneurosis, or a psychopathic state, it is neces- 
sary to consider not only the behavior of the individual but also 
the nature of the handicap or disablement. This makes it 
essential to secure a medical diagnosis as well as a psychological 
diagnosis. Psychiatrists only are equipped professionally to 
consider both these phases. 

A painstakingly thorough study of personality is required 
in the consideration of the readjustment of the individual. It 
seems reasonable that the minimum points to be covered are 
the following: (1) family history, at least as far back as the 
grandparents, in which noteworthy achievements and handi- 
caps are taken into account; (2) personal history from concep- 
tion, including childhood, adolescence and adulthood; (3) a 
particularized history of difficulties in making adjustment— 
especially the failures in school and social and job adjustments; 
(4) a physical examination by a competent physician; (5) a 
preliminary mental examination by means of a valid tempera- 
ment test followed by a verification of the results in a personal 
interview or by psychiatric examination; (6) an interest exami- 
nation by a standardized interest inventory; (7) examination 
of skills and aptitudes by competent tests; (8) examinations by 
intelligence tests—preferably by an individual intelligence test; 
(9) the analysis of all of the data obtained in steps one to eight 
and a report to the subject; (10) a written summary and report 
to the subject. 

The use of such a procedure is very likely to be effective in 
substantiating and explaining the individual tests by the results 
of the tests in other fields. Such a procedure is more than 
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likely to bring to light the extent to which the individual has 
undergone strain, the character of that strain, and its probable 
effect upon his temperamental integration. 


Temperament Tests in Industry 


Temperament tests in industry are valuable to supplement 
aptitude tests and data obtained by non-testing procedures. 
Aptitude tests tend to reveal to the technician what the indi- 
vidual can do; temperament tests, what he will do; and interest 
tests, what he likes to do. The integration of testing methods 
and non-testing methods in a routinized procedure is very likely 
to prove more effective than the use of either test procedures 
or non-test procedures alone. A good industrial appraisal pro- 
gram probably should include the following: 

(1) A standardized application form or job-specification- 
and-qualification sheet. This form should contain spaces for 
background, training, experience, job titles, and job duties. 

(2) Intelligence tests; if group tests are used, at least two 
should be included. When possible, one of these should be a 
timed test and one an untimed test. Some individuals do not 
respond well to timed tests. 

(3) A temperament test; this test determines the indi- 
vidual’s self-mastery and self-control, the strength of his tem- 
peramental characteristics, and his behavior tendency pattern. 

(4) An interest inventory; this measure determines whether 
or not the individual’s interests are such as to make him con- 
tented in the type of work being considered. 

(5) Skill or aptitude tests; skill tests are to be preferred 
where the individual is already trained for the contemplated 
job. Aptitude tests are to be preferred where the individual 
is a trainee. 

(6) Physical examination by the company physician. 

(7) Summary of all of the data considered in the foregoing 
six procedures, a listing of assets and liabilities with regard to 
the job, and a statement of job risk. 

Such a set of procedures as this can be so routinized as to take 
less than three hours’ time. The fact that many of the tests are 
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group tests makes it possible to test many individuals simul- 
taneously. However, the most important feature of such a set 
of procedures is its thoroughness. After all of these points have 
been covered, it is possible to have such an understanding of the 
potentialities of the worker as can be used for selection, place- 
ment, counseling, supervision, and readjustment. 






































Summary 


Temperament is one aspect of personality, but not the whole 
of personality. It is the pattern or complex of tendencies which 
express themselves in behavior in the presence of trigger situ- 
ations. 

The measurement of temperament requires: (1) a valid and 
comprehensive analysis of temperament as a base of departure; 
(2) items or questions which adequately sample the field of 
temperament; (3) adequate item analysis; (4) norms which 
afford a description of temperament and comparisons with the 
population; (5) provision for dealing with atypical response 
attitudes. 

Temperament tests may be used for the prediction of be- 
havior when other pertinent facts are known; that is, when the 
environmental strain can be estimated. They cannot be used 
for such prediction unless environmental strains are considered. 
(Incidentally, environmental strain cannot be taken as a con- 
stant. Even the conditions of combat represent for some men 
a challenge or opportunity or release, while for others they 
represent only danger or sorrow or frustration. ) 

Temperament tests, properly used, can be valuable aids to 
the clinical and industrial psychologist for the information they 
give with respect to mental health, temperamental integration, 
strength of various temperamental characteristics, and be- 
havior patterns.. 
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MECHANICAL ABILITY, ITS NATURE AND 
MEASUREMENT. II. MANUAL 
DEXTERITY 


J. R. WITTENBORN 
Yale University 


Introduction 


The factor analyses of test samples employed in studies of 
mechanical and motor abilities by Harrell (3) and Wittenborn 
(6) have shown that the variables may be classified on the 
basis of their interrelationships. These classifications, or fac- 
tors, offer a functional basis for the definition of abilities. In 
the analyses of mechanical ability these factors appear to be of 
two types and for the sake of simple designation may be given 
the superficially descriptive labels of “mental” and “motor.” 
These rubrics are, of course, not explanatory, but they are 
appropriate insofar as the variables contributing to the “men- 
tal” abilities (scholastic, spatial visualizing, and perceptual) 
are considered to be independent of the exact mode of expres- 
sion. Tests contributing to the “motor” abilities (dexterity, 
repetitive movement, and steadiness) appear to be peculiarly 
dependent upon the quality of muscular performance. 

The present paper is concerned chiefly with “motor” abili- 
ties, particularly those which may be called “manual.” It is 
based primarily on data from the Experiment Proper of the 
Minnesota Mechanical Ability program of research (4). The 
Minnesota program had two aims: one was to predict “me- 
chanical ability” for a group of junior high-school boys in shop 
courses; the other was to understand the general nature of 
mechanical ability, something about its origins, and the condi- 
tions for its development. As a part of the Minnesota study 
of the nature of mechanical ability, the following variables from 
the Experiment Proper were intercorrelated: 
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1. Age 26. Father’s mechanical operations 
2. Otis, L.Q. 27. Tools owned by son 

3. Packing Blocks 28. Tools owned by father 

4. Card Sorting 29. Things done questionnaire 

5. Minn. Spatial Relations 30. Mechanical occupations prefer- 
6. Paper Form Board ences 

7. Stenquist Picture I 31. Academic preferences 

8. Stenquist Picture II 32. Interest Analysis Blank (old) 

9. Minn. Assembly 33. Gymnasium ranks 

10. 100-yard dash 34. Academic grades 

ll. Back dynamometer 35. Garfield’s Agility battery 

12. Right-hand dynamometer 36. Minn. Agility battery 

13. Steadiness 37. Interest Analysis Blank (new) 
14. Left-hand dynamometer 38. Shop operations quantity-quality 
15. 25-yard hop criterion 

16. Spirometer 39. Education of father 

17. Broad jump 40. Education of mother 

18. Height 41. Mechanical ability rating of 

19. Weight father’s occupations 
20. Shop operations quality criterion 42. Mechanical ability rating of other 
21. Shop operations information cri- ancestors’ occupations 

terion 43. Barr scale ratings of father’s 

22. Cultural status occupations 
23. Literary interests 44. Barr scale ratings of other ances- 
24. Recreational interests tors’ occupations 
25. Son’s mechanical operations 45. Otis, mental age 


An examination of the intercorrelations revealed that 
numerous variables, such as numbers 22, 23, 24, 27, 28, 29, and 
30, bore no important relationships with other variables. Cer- 
tain other variables, such as 35 and 36, 39 and 40, 41 and 42, 43 
and 44, tended to form independent couplets and as a conse- 
quence were of no general interest. Mental age and other vari- 
ables relating to academic status were of no interest in the 
present study, and the sole steadiness test, variable 13, showed 
no important relationship with any of the other variables. 
Certain of the variables, however, showed significant interrela- 
tionships, and their nature suggested that further scrutiny 
might afford additional insight into the nature and organization 
of mechanical ability. These promising variables and their 
intercorrelations are presented in Table 1. 


An Analysis of the Minnesota Data 
The 16 variables for which intercorrelations are shown in 
Table 1 were selected with certain expectations. It was be- 
lieved, for example, that the pattern of their intercorrelations 
might confirm the tendency for measures involving a high de- 
gree of manual dexterity to form an independent functional 
classification (6). It was expected, moreover, that an analysis 
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TABLE 1 
Intercorrelations of 16 Selected Variables (N= 100) 
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19 —.09 —.11 -.01 -.05 .18 -.04 .04 59 67 68 .74 .78 
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of the selected intercorrelations would contribute to our under- 
standing of the general nature of the dexterity factor. It 
seemed to be particularly desirable to know to what degree 
measures of strength and physical development contribute to 
an ability such as manual dexterity. 

The intercorrelations were subjected to a centroid analysis 
and four factors were extracted. No residual significantly 
greater than zero remained. When the centroid matrix, Table 
2, is postmultiplied by the transformation matrix, Table 3, the 
orthogonal rotated factor matrix, Table 4, is produced. 

Although an orthogonal solution is given to the present 
problem, it is apparent that Factors I and II are not truly inde- 
pendent. The variables which cluster together to form Factor 
II have higher loadings on Factor I than on Factor II. It is 
apparent, therefore, that presentation of Factor II as a factor 
independent of Factor I is not in strict conformance with the 
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TABLE 2 
Centroid Matrix 














eo. rr 31 33 39 36 49 
ee See .24 .26 45 42 50 
5. Minn. Spatial Relations ......... 56 S22 =2 12 61 
6. ‘Paper Form Board ..:........... 46 48 -.25 .09 51 
7. Stenquist Picture I ............. 56 23 12) -.21 43 
8. Stenquist Picture II ............ 40 43 2300 -.15 42 
| a 59 45 -10 -.16 58 
11. Back dynamometer ............. 61 —-.47 27 =-.09 67 
12. Right-hand dynamometer ........ 53  —.64 10 -.17 73 
14. Left-hand dynamometer ......... 54 -.68 13° -.18 .80 
SD ONE i viva seesweuseeewseus 53 -58 --.16 .23 .70 
Se ES oi one pis te NS ob sees seen 49 -64 -.17 19 71 
NE esac wcnua cebu seuss 55  -67 —-.17 .08 719 
20. Shop operations quality criterion .. —.55 A7 = -.25 01 59 
25. Son’s mech. operations .......... 35 14 -23 —.35 32 


37. Interest Analysis Blank (new) ... .50 37 -22 -.14 46 





TABLE 3 


Transformation Matrix 














I II III IV 
I 58 -.72 - .08 39 
II 39 -.18 34 — .83 
III 67 59 - 44 .00 
IV 25 32 82 39 
TABLE 4 


Rotated Factor Matrix 











5; Pia TAOS Sooo i sca 05 -.11 .23 65 49 
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11. Back dynamometer ............. 61 48 .00 18 63 
12. Right-hand dynamometer ....... 69 49 -06 -.07 73 
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Oe: Rs 5 6'd sr, Fab b Fa ob os eee b's 82 .05 03 -.16 .70 
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37. Interest Analysis Blank (new) ... —.02 17 66 01 A7 
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numerical solution in the present study. As the factors are dis- 
cussed variable by variable, the data will be presented in the 
form of factorial equations.” 

Factor I appears to be a size or a maturational factor. It is 
determined primarily by variables 16, 18, and 19. 


Factor I—Size 

I II III IV Uv? 
11. Back dynamometer ........ 37 .23 .00 .03 37 
12. Right-hand dynamometer .. .48 .24 .00 .00 .28 
14. Left-hand dynamometer ... — .52 27 (-) 01 .00 .20 
16. Spirometer ............... 69 .00 00 (-).01 30 
WU ER bcs Sao s agkcaaeees 67 .00 00 (-) .03 .30 
BP AV MIE a5.o 0s aials oaig.s-s, asais as 71 04 00 (-) .04 21 


Approximately 70 per cent of the total variance of each of these 
variables is found in Factor I and no significant amount of vari- 
ance is contributed by these variables to any other factor. 
Tests 11, 12, and 14, which suggest a strength factor, Factor II, 
actually have most of their common factor variance and ap- 
proximately 50 per cent of their total variance in Factor I. 
This finding is of interest because in an analysis of data for 328 
youths who were older than the present group it has been found 
that strength and size are independent of each other (2). Be- 
cause of this, Factor I and Factor II are treated in the present 
study as independent of each other. The writer offers as addi- 
tional justification for this treatment the consideration that no 
additional understanding of the organization of the variables 
would result from rigorously defining Factor II as highly corre- 
lated with Factor I.?_ Since the data of the present study do 
not call for a strength factor independent of the size factor, this 
independence can only be considered as hypothetical. It is 
reasonable to find size and strength highly correlated among 
young boys and to expect these variables to become increasingly 
independent as maturation is attained. It is hoped that the 
results of this study will have implications for the use of certain 

1 Factorial equations are more revealing than simple factor loadings because they 
not only show how much of the total variance of the test is due to each factor, but 
they also show how much variance is not due to common factors, i.e., how much (u*) 
is unique to the test in the present sample. The values in the factorial equations are 
equal to the respective factor loadings squared. 

2 Actually the present data could be accepted as yielding 3 factors: size-strength, 


spatial ability, and dexterity. The four-factor solution is somewhat “forced” and 
justified by the above-mentioned considerations. 
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types of tests in the selection and guidance of young adults. 
It is hypothesized, therefore, that for such young adult groups 
body size and strength of the upper parts of the body are rela- 
tively independent of each other. 

Factor II, the postulated strength factor, is of considerable 
interest in the present study because its variables do not con- 
tribute to the manual dexterity factor, Factor IV. 


Factor II—Strength 
I II Ill IV U? 
11. Back dynamometer ........ 37. .23 .00 03 37 
12. Right-hand dynamometer ..__ .48 24 .00 .00 .28 
14. Left-hand dynamometer .... 52 27 + « (-) Ol .00 .20 


The contribution which Factor II makes to certain other vari- 
ables such as the Son’s Mechanical Operations variable and the 
Stenquist Assembly tests is meaningful insofar as strength of 
hands among boys would be expected to be associated with the 
use of the hands either as indicated directly by the Son’s opera- 
tions questionnaire or indirectly by the Stenquist Assembly 
tests which sample mechanical knowledge. The fact that vari- 
ables 3 and 4, the manual-dexterity variables, do not contribute 
to this factor in any way is taken as additional evidence that 
manual dexterity is a classification of ability quite independent 
of other types of manual ability (6). 

Factor III, the spatial relations factor is defined by the 
Minnesota Mechanical Assembly Test, the Minnesota Paper 
Form Board Test and the Minnesota Spatial Relations Test. 


Factor III—Spatial Visualization 


5. Minn. Spatial Relations ....... .00 .00 55 07 38 
6. Paper Form Board ............ .00 .00 49 01 50 
7. Stenquist Picture I ............ .00 15 .22 05 58 
8. Stenquist Picture II ........... (-) .03 .08 18 14 57 
ie ES” ae a .00 06 ot .02 A.) 
20. Shop operations quality criterion .00 .00 56 (—) .04 40 
25. Son’s mechanical operations .... .00 10 17 (-) .02 71 
37. Interest Analysis Blank (new) .. .00 03 44 .00 53 


It is most interesting to observe that variable 20, the Shop 
Operations Criterion, has all of its common factor variance and 
over 50 per cent of its total variance in this particular factor. 
In addition, mechanical interests, 37, and Son’s Operations in 
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mechanical activities, 25, are also highly correlated with this 
factor. 

The importance of measures of spatial ability as indices of 
mechanical promise is strikingly indicated by the nature of this 
factor. Not only the criterion but interest in mechanical activi- 
ties appears to be quite independent of the three additional 
factors which appear in this study and which might on an a 
priori basis be expected to contribute to a shop operations cri- 
terion of mechanical ability. 

Factor IV is perhaps the most interesting factor in the pres- 
ent study. 


Factor IV—Manual Dexterity 


I II Ill IV U? 
3. Packing Blocks ......... .00 (-) .01 05 42 ae 
4, Card Sorting. <:..sis0.004 .00 (-) .03 01 45 51 


It is defined by two tests which appear to call for a type of 
manual dexterity. However, these tests do not contribute to 
the spatial-visualizing factor which in the light of this study 
is the mechanical-ability factor. Perhaps more surprising is the 
fact that neither the strength nor the size factors contribute in 
any way to facility in manual dexterity as identified by this 
factor. Although manual dexterity is an ability which has long 
been considered as a definable attribute, prior to the investiga- 
tions of this series its existence as a functional classification of 
ability had not been satisfactorily demonstrated. As a matter 
of fact the data presented in the present and in the preceding 
study may not be regarded as adequate to define satisfactorily 
an ability such as manual dexterity. This reservation is reason- 
able since block packing and card sorting were principal varia- 
bles in defining this factor in both of these studies. Although 
the studies were done on two samples and the factor occurs in 
two different test batteries, its existence requires further 
demonstration. 


Further Evidence for Identifying Manual Dexterity 


In order to shed more light on the existence and the nature 
of this factor additional data were sought for further scrutiny. 
Data suitable for this purpose were found in the Measurement 
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of Manual Dexterities by Earle (1). He had published the 
intercorrelations of ten measures, nine of which were considered 
to be measures of manual dexterity. The tenth was a criterion 
of mechanical ability, the exact nature of which was not defined 
in his publication. The intercorrelations of these variables for 
a sample of 79 continuation Day-School boys are presented in 
Table 5* and the nature of each of the performances is indicated 
below: 
TABLE 5 
Intercorrelations for 79 Day-School Boys* 








I I WI Vi Vis VIL VIII IX XII Crit. 
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* Median age, 14 years and 5 months. 


Test I. Tapping movement of forefinger using wrist. A lever 
is tapped by the forefinger of the preferred hand while 
the non-preferred hand holds the apparatus. 

Test II. Tapping movements of several fingers in succession 
using wrist. The individual taps with each of the 
four fingers successively beginning with the little 
finger each time and tapping in order from the little 
finger to the index finger as quickly as possible. 

Test III. Twisting movements of finger and thumb with wrist 
action. In this test the individual is required to turn 
the barrel of the turn buckle until the eye is as far in 
the barrel as possible and then to reverse the direction 
of rotation and turn the barrel as quickly as possible 
until the eye is released. 

Test VI, part I: The individual is required to place 100 pegs 

into a 100-hole peg board, picking up one peg 
at a time. 


3 Precise scoring methods used in securing these data are not specified by the 
author. 
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part II: The individual is to fill three rows of the 
peg board, halting temporarily between rows. 

Test VII. The individual is requested to place the pegs in the 
peg board using the thumb and each finger (except 
the forefinger) in succession to pick up the pegs. 

Test VIII. Placing pegs in holes under manipulative diffi- 
culties. 

Part I: The individual takes all the pegs from one 
row of the peg board and then replaces them 
keeping them in his hand as he works; 

Part II: The individual extracts all the pegs from 
two rows of the peg board and then returns 
them keeping them in his hand during the 
process. 

Parts III and IV are conducted under such 
manipulative difficulties as maintaining the 
operating hand full of pegs while working. 

Test IX. Placing pegs in holes which are not visible. In this 
test a tactual exploration is made by the free hand, 
usually the left, while the right hand is used to pick 
up the pegs. The subject is not blindfolded but a 
screen is placed between him and the board. It is 
required that the subject feel for the hole each time 
with his left hand. 

Test XIII. Discrimination between fine and coarse textures 
by sense of touch. A screen is placed between the 
individual and a tactual board upon which strips of 
sand paper are placed and manipulated in such a 
fashion as to permit the individual to attempt to tac- 
tually recognize the match for several different grades 
of sand paper. 


Tests [IX and XIII are of particular interest because they 
are relevant to a question raised in the first paper in this series; 
it was found that the digit-symbol-substitution test, which 
would appear to call for no high degree of manipulative dex- 
terity, was significantly correlated with the dexterity factor. 
The identity and the nature of the dexterity factor was put in 
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doubt by this finding. It appeared plausible that the dexterity 
factor is the old hand-eye coordination ability so commonly 
spoken of in earlier studies of mechanical testing. It was sug- 
gested that a hypothesis that manual dexterity calls for visual 
recognition and discrimination as well as a manipulative ability 
could be tested by screening the manipulative work from the 
testee’s field of vision. Test IX suggests the possibility of a 
preliminary test of this hypothesis. 

In order to examine this possibility and to seek further evi- 
dence of manual dexterity as a valid functional classification 
of ability, Earle’s intercorrelations of the ten variables were sub- 
mitted to a factor analysis. It was found that two centroid 
factors, Table 6, permitted a satisfactory reconstruction of the 
intercorrelation table. The two factors were then subjected 
to a single orthogonal rotation and the nature of the rotated 
factors, as shown in Table 6, appears to be meaningful. 














TABLE 6 
Centroid Factors Rotated Factors 
I II A B 
Se Eee 47 — 44 65 .06 
ee er eee ee Al - 30 50 ll 
J eee 51 — 32 58 17 
Lf ge ESO aa 65 31 19 70 
WER OREN divaisccs enews ee 51 24 16 54 
WEAS BID cuSspundscsous A9 27 12 a9 
VIII. Pegs (handicap) .... 59 — .06 44 40 
IX. Pegs (not visible) ... 54 18 .22 53 
i | ils 5 A Al 30 .05 51 
SENSI | 5 ass kn beee 34 —.13 Be 17 





Factor A calls for the same class of operations as the ballistic 
or repetitive movement factor described in the first paper of 
this series. Earle’s test III which calls for a simple, highly 
speeded twisting or twirling movement of the fingers is highly 
correlated with Factor A. This finding suggests that the simple 
repetitive movement of tapping involves the same ability as 
the more industrially significant repetitive twisting or twirling 
manual operations. Tests for the repetitive movement factor 
may conceivably be of considerable value in selecting workers 
for certain types of common industrial piece work employment. 
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Factor B seems to call for the same type of manual facility 
or manipulative ability which characterizes the dexterity factor 
revealed by the analyses of the Minnesota data. The tests 
differ from any of those used in the Minnesota study. One of 
them, test IX, does not permit the use of vision. Performance 
on test XIII depends upon accuracy of tactual discrimination. 
The analysis of Earle’s data not only confirms the tendency for 
measures of manipulative ability to be positively intercorre- 
lated but contributes to our understanding of this tendency. 
Tests calling for controlled placing, or adjusting movements, 
of the hands are interrelated as if they depended upon a single 
ability or capacity. This ability appears to be independent of 
the visual modality; it may be chiefly dependent upon the 
tactual and kinesthetic modalities. 

The tendency for tests involving manual dexterity to be 
more highly correlated among themselves than with other tests 
is manifested in yet another context. Teagarden (5) has inter- 
correlated two Kent-Shakow scores, Minnesota Spatial Rela- 
tion, two Minnesota Rate of Manipulation scores, and two 
scores from the Cincinnati Pliers Test. The tendency for the 
spatial visualization tests to form a correlation cluster different 
from the cluster formed by the dexterity tests is unmistakable. 


Conclusions 


Guidance experts and personnel technicians use tests which 
on the basis of the current statistical classifications are con- 
sidered to be measures of steadiness, repetitive movement, dex- 
terity, and spatial visualization. Yet, with the exception of 
the use of the spatial visualization tests, their procedures are 
justified chiefly on the basis of intuition and not on the basis 
of high correlations with external validating criteria, i.e., spe- 
cific industrial performance. The literature abounds with evi- 
dence of the validity of spatial visualizing ability tests for pre- 
diction of mechanical work. Acceptable external validities of 
manual dexterity tests are more rare, however. In general, the 
most promising validity coefficients for manual dexterity tests 
have been obtained with ratings of supervisors as a criterion. 
Evidence of validity for the practical use of measures of repeti- 
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tive movement and steadiness are practically non-existent in 
the literature. 

The suggestion is offered that the common failure to vali- 
date tests of factors other than spatial visualization and scho- 
lastic ability (which at the adult level may be fractionated into 
other abilities) is probably due to the nature of the criteria that 
have been employed. Most of the criteria that have been em- 
ployed in the prediction of mechanical ability have been work 
samples prepared under unusual competition and other atypical 
conditions which appear to call for a much higher order of 
spatial visualizing judgment than manipulative ability, e.g., the 
criteria used in the Minnesota study. The so-called motor 
aspects of mechanical ability cannot be assumed to be of limited 
significance simply because their significance has not been rigor- 
ously demonstrated by suitable studies. If investigators em- 
ployed such criteria as satisfaction in work, duration of employ- 
ment in routine operations, speed of work, quality of specific 
operations, piece work output, breakage, fatigability and other 
factors of great practical significance in industrial operations, 
it might well be demonstrated that the motor abilities, particu- 
larly manipulative ability could, on the basis of demonstrated 
predicted value, be granted a significant réle in guidance and 
selection procedures. 

The term “mechanical ability” does not lend itself to ade- 
quate definition, however. In modern industrial employment 
there are innumerable different operations which involve the 
use of machines, tools and other mechanical contrivances. It 
appears likely that the successful prediction of satisfactory per- 
formance and good morale in these industrial activities is more 
dependent upon the development of adequate criteria than 
upon the invention of new ability tests. It is suggested that 
the greatest immediate progress in the field of mechanical 
ability testing dépends upon extensive factor analytical studies 
of interrelationships of criteria of different phases of industrial 
operations and at different levels and types of work. Unfor- 
tunately such varied criteria would not be available for most 
groups of industrial workers; certain paid apprenticeship or 
training groups would probably be the most desirable subjects 
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for this research. It is only on the basis of such intensive re- 
search that mechanical ability may be satisfactorily defined. 
Definition of mechanical and manual work on any other basis 
is arbitrary and therefore not likely to be generally applicable. 

The studies of the present series demonstrate the commonly 
observed tendency for intercorrelated psychological variables 
to form clusters, i.e., to permit a somewhat rigorous mathe- 
matical classification of the variables. The question which con- 
tinually arises in a discussion of such studies as these is what 
significance may be ascribed to the classifications. The classi- 
fications which have been established by factor analysis could 
certainly be due to the sampling of the measures which are sub- 
jected to analysis. The sampling could be either a deliberately 
or an unconsciously obtained result. 

The test samples could be a function of our culture. Per- 
haps the human organism is physically capable of an indefinite 
variety of response patterns. If this were so, the culture could 
be regarded as determining which response patterns are of 
practical significance and through this mode of influence the 
culture could also determine the pattern of performances sam- 
pled by current psychological tests. In addition to this selec- 
tive effect, it is conceivable that a given culture may actually 
determine the development of abilities. Just as response pat- 
terns are elicited by life experiences, ability patterns may 
appear in a group of individuals in response to the exigencies 
of existence in the society. These cultural requirements may 
possess a structure which could be reflected in the organization 
of ability. The appearance of a pattern of ability among all of 
the individuals participating in a culture is expressed by the 
consistency in the society of intra-individual differences with 
respect to the various classes of ability. Currently the most 
satisfactory explanation of the development of intra-individual 
differences would rest upon learning theory. Learning theory is 
sufficiently well developed to enable us to envisage in a general 
way the manner in which differences in the environments of 
individuals could favor the development of intra-individual 
differences along lines which would reflect aspects of the culture. 

The great diversity of culture has been observed from group 
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to group at different periods with respect to many important 
attributes. It has not been shown to the writer’s knowledge, 
however, that the factorial pattern of ability varies meaning- 
fully from culture to culture. Until this diversity has been 
demonstrated, the arresting possibility remains that the organi- 
zation of ability may be a more or less standard pattern and 
a necessary consequence of the functional limitations of the 
human organism. A careful investigation of cultural differ- 
ences in ability patterns appears to be necessary for the de- 
velopment of a science of human ability. 

As previously mentioned, validity of tests for the various 
factors is directly dependent upon the type of criteria employed. 
Regardless of the nature of the criteria, however, it is frequently 
found that specially devised tests of rather anomalous factorial 
composition show higher validities than tests for known factors. 
The superior validity of tests which are specific to a task implies 
the existence of specific factors practically significant for the 
tasks. The ultimate significance of such hypothetical specific 
factors is unknown and probably rests in part upon the nature 
of their origin. Since the identifiable, stable factors are ap- 
parent among individuals at different age levels, it may be 
inferred that such abilities are relatively stable within the 
individual and therefore may be validly employed in long 
range predictions. It is possible, however, that specific abili- 
ties measured by certain tests are readily acquired in response 
to specific experiences and may have no great ultimate predic- 
tive value despite their value in predicting criteria established 
shortly after the application of the original test. More data 
establishing the long term predictive value of all of our tests 
are greatly needed. Long term predictions are always less 
reliable than immediate ones. This effect is doubtlessly due 
in part to the loss or acquisition of readily acquired, transient, 
specific abilities. 


Summary 


On the basis of factorial analyses of the Minnesota data and 
the examination of data of other studies of mechanical ability, 
it is apparent that a complete assay of an individual’s potenti- 
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alities for all types of mechanical or manual work would call 
for measurement of at least the following attributes: 


1. scholastic ability 5. repetitive movement 
2. spatial visualization 6. steadiness 

3. perceptual speed 7. strength 

4. manual dexterity 8. size 


The exact organization of the factors at different age levels 
requires further analytical study. The degree to which the 
importance of each of these abilities varies from job to job is 
unknown, but it is subject to critical determination. 
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SPEED AND LEVEL COMPONENTS IN TIME-LIMIT 
SCORES: A FACTOR ANALYSIS? 


WILLIAM M. DAVIDSON 
Ist Lt., Army Air Corps Classification Section, Biggs Field, Texas 
AND 
JOHN B. CARROLL 
Lt. (jg), USNR, Bureau of Medicine and Surgery 


WHETHER by force of tradition, or for reasons of expedience, 
it has been the practice to administer and score group tests of 
ability, aptitude, and achievement in such a way as to yield 
only a time-limit score, defined as the number of items cor- 
rectly answered within a specified length of time. Thus, the 
time-limit score often becomes the sole measure of the behavior 
represented in a test. When a test is “validated” with respect 
to some external criterion, a time-limit score, rather than some 
other type of score, is most likely to be used as the measure 
which is correlated with the criterion. Likewise, in making a 
factor analysis of a battery of tests, one is most likely to use 
time-limit scores. It is the writers’ belief that the indiscrimi- 
nate use of time-limit scores is one of the more unfortunate 
characteristics of current psychological testing since the time- 
limit score of a test frequently represents two relatively inde- 
pendent aspects of behavior: (a) the amount the subject knows 
or can perform (or in certain cases, the level of difficulty which 
he can reach), and (b) the rate at which the subject works. 
Somewhat at variance with current usage, we shall identify 
these aspects of test behavior, respectively, by the terms level 
and speed.? By ignoring the possibility that these two aspects 
of test performance may play different réles in any given situ- 

1 This paper is a revision of a thesis presented by the first-named author as a 
candidate for the M.A. degree at Indiana University, 1943. 

2 These terms were employed by Baxter (1), who was able to show a marked 
independence of speed and level in a single omnibus test of intelligence. The present 


study, in effect, extends Baxter’s approach to tests of varying content. 
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ation, the applied psychologist runs the risk of obtaining valid- 
ity coefficients lower than those which might be obtained if the 
level and speed components were correctly weighted in the pre- 
diction. For example, if the level score on a test has greater 
validity than the speed score in predicting a criterion, use of the 
time-limit score may tend to mask the potential validity of the 
test by introducing the “dead wood” variance of the speed com- 
ponent. In factor analysis there exists a real danger that the 
primary factors which are found in a set of time-limit scores 
may themselves be factorially complex, that is, that they may 
consist of both speed and level components. When these pri- 
mary factors are correlated, as is frequently the case, one should 
not consider the hypothesis that the correlations indicate the 
presence of a general factor of intelligence until it is shown that 
they are not due to the presence of an underlying speed factor. 

It is true that a logical distinction between speed and level 
elements in test performance has long been recognized. How- 
ever, in practice it has been assumed that since these elements 
appear to be highly correlated they are merely different aspects 
of the same underlying entity and that consequently the dis- 
tinction can be ignored. Furthermore, it has been assumed 
that in any case the normal exigencies of group test adminis- 
tration preclude any attempt to make separate measurements 
of speed and level. Without undertaking to review the litera- 
ture on the problem, we believe that these assumptions will bear 
analysis. 

The assumption that speed and level are different aspects of 
the same thing has arisen partly through confusion in terms and 
partly through misinterpretation of the experimental evidence. 
The most frequent error is that of identifying time-limit scores 
as “speed” scores and then proceeding to cite correlations be- 
tween time-limit scores and scores obtained in unlimited time. 
The point has been missed that these correlations are spuriously 
high, since they rest on a part-whole relationship. The score 
obtained in unlimited time is equal to the time-limit score plus 
whatever the subject can accomplish in additional time. More- 
over, the correlation between these scores is a function of the 
length of the time-limit, for obviously as the time-limit is 
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lengthened the time-limit scores become more similar to scores 
obtained in unlimited time. In any case, correlations as high 
as even .7 or .8 are still not high enough to rule out the possi- 
bility that a speed component, independent of level, exists in 
the time-limit scores. 

The number of studies in which correlations have been 
obtained between rate-of-work scores and level scores is exceed- 
ingly small. The obtained correlations are seldom more than 
moderately high but even these have occasionally been cited as 
showing the fundamental identity of speed and level compo- 
nents in test performance. 

These misinterpretations have usually occurred in connec- 
tion with test performances in which the subjects vary con- 
siderably with respect to their ability to answer the items and 
in which the scores involve the number of items correctly 
answered. A particularly dangerous misinterpretation, how- 
ever, is likely to arise in connection with tests in which the sub- 
jects vary not in their item-passing ability, but only in their 
rate of performance. One frequently cited study is that carried 
out by Paterson and Tinker (7), who came to the perfectly 
sound conclusion that when corrected for attenuation, the cor- 
relation between “work-limit” and “time-limit” scores on a 
speed-of-reading test is virtually perfect. The work-limit score 
was not the number of items correctly performed in unlimited 
time but, instead, the time taken to read all the paragraphs in 
the test. The time-limit score was the number of paragraphs 
read within a time-limit. What Paterson and Tinker showed, 
then, was that in the measurement of a rate of performance 
it makes little difference whether the scores are expressed in 
terms of performance-per-unit-of-time or time-per-unit-of-per- 
formance. A convenient paradigm is that of a runner’s speed, 
which can be expressed either in terms of feet per second or in 
terms of seconds per foot. It is a mistake, however, to general- 
ize the results of the Paterson and Tinker study by inferring 
that “work-limit” and “time-limit” scores in the usual sense 
will be highly correlated in situations where elements of test 
performance other than rate are measured. 

With respect to the presumed impracticability of measuring 
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speed and level components separately within a time-limit, we 
can only point out that few attempts have been made to explore 
the problem. With ingenuity, it should be possible to devise 
relatively simple methods of making separate measurements 
even within a reasonable time-limit. 

. We would by no means assert that speed and level compo- 
nents of test performance are invariable entities from test to 
test. Rate of performance in one task may be completely inde- 
pendent of rate of performance in another task. Similarly, level 
components undoubtedly vary from test to test. The investi- 
gation reported here establishes the independence of several 
distinct types of speed, in addition to a general speed factor, 
and previous factorial investigations have isolated several types 
of level components (such as vocabulary knowledge, ability to 
solve problems expressed verbally, etc.). 

We conclude this general introduction by making several 
recommendations in the fields of test construction and factor 
analysis. First, we suggest that persons responsible for the 
standardization and validation of tests experiment with the 
differential validities of speed and level scores and incorporate 
any significant findings in the directions for administering, scor- 
ing, and interpreting the tests. Investigations should be made 
of the possibility of restandardizing various published tests in 
terms of speed and level. Persons charged with selecting tests 
for use in given situations should give preference to tests which 
have been so standardized. Collateral experiments should 
meanwhile be directed towards discovering more efficient and 
reliable methods of measuring speed and level than, say, those 
employed in the present investigation. 

Our second major recommendation is that in factorial 
studies aimed at discovering unitary abilities, tests should be 
represented by speed scores, level scores, or both, and that if 
time-limit scores are to be studied at all they should be treated 
in the manner exemplified in the investigation reported here. 


The Experiment 


In order to establish the linear independence of speed and 
level scores it was decided to study by factor analysis a matrix 
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of correlations between speed, level, and time-limit scores in a 
number of short mental ability tests. As in Baxter’s study (1), 
speed scores were obtained as the number of seconds taken by 
the subject to work from the beginning to the end of the test, 
attempting every item once. Level scores were defined as the 
number of items correctly answered when the subject is allowed 
to take all the time he desires to try every item and to check 
over his work. Time-limit scores were defined as the number 
of items correctly answered within a prescribed time-limit. 

The test battery consisted of the eight subtests of the Re- 
vised Alpha Examination, Form 5; the Minnesota Speed of 
Reading Test for College Students, Form A; and several tests 
which had been specially constructed for previous factorial in- 
vestigations. These included Letter Grouping and Scattered 
X’s, studied by Thurstone (8); and Phrase Completion and 
Disarranged Morphemes, constructed by Carroll (2,3). The 
Revised Alpha Examination was used because its subtests ap- 
pear to measure verbal, numerical, and reasoning factors, to 
judge from Guilford’s analysis of the original Army Alpha test 
(4), and because it is somewhat more practicable to administer 
and score than the original Army Alpha. Letter Grouping and 
Disarranged Morphemes were included to aid in defining the 
domain of reasoning ability. Scattered X’s was included to test 
the hypothesis that the Perceptual Speed factor (P) as mea- 
sured by the test might be involved in some of the speed scores 
studied here. The Minnesota Speed of Reading Test was in- 
cluded because it was believed that speed of reading might be 
related to speed scores on mental tests which contain reading 
material. 

Speed, level, and time-limit scores (as defined above) were 
obtained for each test or subtest in the battery, with three 
exceptions. For the Minnesota Speed of Reading Test, the 
only score obtained was the number of paragraphs marked, cor- 
rectly or incorrectly, in the prescribed 6-minute time-limit. 
This score measures the speed aspect of performance on the 
test. The score on Scattered X’s was the number of x’s found 
and marked in 4 minutes; again, this score is primarily a mea- 
sure of rate of performance. Phrase Completion had no time- 
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limit and was scored by means of a key the construction of 
which has been described in a previous article (3). Special 
instructions and procedures were devised to obtain speed, level, 
and time-limit scores on the same test. 


A large clock with a sweep-second hand was placed in 
view of the subjects. First the subjects worked on a test 
for the prescribed time-limit, marking an “x” in the margin 
after the last answer written within the time-limit. They 
were then instructed as follows: “Continue working rapidly 
on the test to the end, but do not yet change any answers 
you have already written. As soon as you each individually 
finish the test, quickly look up at the clock and record at the 
bottom of the page the minutes and seconds you required to 
do the remainder of the test below the X. Do not stop too 
long on any one problem. You may guess at answers you 
don’t know or leave blanks. Write the time before you go 
back to fill blanks or make corrections. After you record 
your time, you may take your red pencil and make any addi- 
tions or corrections, but do not erase present answers.” The 
students were allowed to work on each test until all had 
finished, except on the Disarranged Morphemes and Letter 
Grouping tests, where a few students were not able to finish 
within 23 and 18 minutes, respectively, after the time-limit. 
This procedure yielded a time-limit score, which resulted 
from the application of the prescribed scoring formula to all 
answers written in ordinary pencil up to the X marked by 
the subject. The speed score was the number of minutes 
and seconds recorded at the bottom of each test. The /evel 
score was the score on the entire test; if an answer in black 
was followed by a different one in red, the latter was taken 
as the answer for purposes of arriving at the level score. 


The time-limits used for the subtests of the Revised Alpha 
Examination were those recommended by Tinker and Baker 
(9) for use with college students. For experimental purposes 
scores for two time-limits—2 and 4 minutes—were obtained for 
subtest 2, Arithmetical Reasoning. 

The subjects were undergraduate students in elementary 
experimental psychology at Indiana University. The analysis 
of test scores was based upon 91 complete cases—12 men and 
79 women. 

The markedly skewed distributions of certain speed scores 
(on the subtests Addition, Common Sense, Same-Opposite, and 
Disarranged Sentences in the Revised Alpha) were made more 
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nearly normal by converting them to the reciprocals of the 
number of seconds. Scores on all variables were coded in ten 
or fewer class intervals. Hollerith procedures were used to 
obtain Pearsonian product-moment coefficients. No correc- 
tions for grouping or attenuation were applied to the coeffi- 
cients. Before the correlation matrix was assembled for factor 
analysis, the level and time-limit scores on subtests Addition, 
Common Sense, and Disarranged Sentences were discarded, 
first, because most of the students made perfect scores in un- 
limited time, and second, because time-limit scores were very 
highly correlated with speed scores. Scattered X’s was omitted 
from the analysis because it was little correlated with any other 
variable in the battery. It was therefore concluded that the 
Perceptual Speed factor as measured by Scattered X’s is not 
significantly involved in the speed variables studied here. 


The Factor Analysis 


Level and time-limit scores, as defined here, are overlapping 
measures since the level score on a test can be regarded as equal 
to the time-limit score plus whatever additional correct answers 
the subject can give in time beyond the time-limit. There is 
likewise an obvious overlap between speed and time-limit scores 
since the faster the subject works the more items he has an 
opportunity to pass within the time-limit. It was believed that 
these factors of overlap would introduce spurious dimensions in 
the factor analysis if the correlation matrix were analyzed in 
the usual fashion. To put the matter differently, insertion of 
the time-limit scores would spuriously raise the communalities 
of the speed and level scores. A special method of factoring the 
matrix was suggested by Dr. L. R. Tucker. The main matrix, 
involving only speed and level variables, was analyzed in the 
usual way by the centroid method. All correlations between 
time-limit scores and speed scores or between time-limit scores 
and level scores were placed in a subsidiary matrix and factored 
separately. (See Table 1 for the correlations represented in the 
main and subsidiary matrices.) Correlations among time-limit 
variables were not analyzed at all. Essentially, the procedure 
involved locating the time-limit variables in the factor space 
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defined by the speed and level variables. Factor loadings for 
the variables in the subsidiary matrix were obtained by sum- 
ming the columns of correlations or residuals; the product of 
the column sum and the value (r)-# used to compute the mth 
factor loadings for the main matrix was the mth factor loading 
for the subsidiary matrix variable. Residuals in the subsidiary 
matrix were computed and treated in the usual way. 


TABLE 2 
The Centroid Matrix 











Test I II Il IV Vv VI ? 
5 32 - 37 -.10 - .18 -.05 27 36 
6 43 - 40 - .09 35 18 -.18 54 
7 71 05 - 41 -.18 -.17 -.1l 75 
8 62 10 — 34 -.16 — 04 23 59 
“ 65 12 — .28 -.16 — .04 - .05 52 

10 63 — 36 -.10 -.18 15 10 60 
11 71 - .06 -.21 -.14 07 -.07 58 
12 76 -.21 -.05 - .13 05 — .06 65 
13 64 18 -.1l 32 — 32 — .02 66 
14 39 -.16 — .23 32 — .04 - .13 35 
16 56 —.24 29 19 — .13 -.10 52 
18 48 44 .03 -. 05 32 — 04 53 
20 67 -.12 42 -.22 -.18 - .05 72 

21 40 18 22 -.20 15 -.24 36 

22 61 11 i33 -.17 -.22 -.14 59 

23 65 34 .28 16 -.18 -.13 70 

24 26 06 24 21 21 31 31 

25 51 14 a5 05 10 12 43 

35 58 17 — 34 .05 22 -.17 56 
27 69 - .23 .26 22 -.05 -.22 .70 
28 58 -.25 36 .07 .02 — 06 55 
30 62 44 - .23 -.20 18 26 77 
32 .70 —.28 .23 -.25 -.09 -.01 69 
33 71 — 13 -.20 — .18 15 -.17 64 
34 75 15 -.1l .03 -.01 02 .60 
36 72 .26 Bhs .26 -.07 - .09 69 
38 62 13 25 18 17 10 54 








As shown in Table 2, six centroid factors were extracted. 
The centroid matrix for the main correlational matrix was the 
basis for the rotation to simple structure, which was accom- 
plished by Tucker’s semi-analytical method (10) in five trials. 
The transformation matrix (Table 3) was used to obtain the 
final rotated matrix (Table 4) both for the main and the sub- 
sidiary matrix variables. The time-limit scores did not in any 
way influence the rotational procedures; nevertheless, the vec- 
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TABLE 3 
The Transformation Matrix 
A B Cc D E F 

I 33 30 16 .09 18 42 
II -.77 09 -.16 ll 32 -.l1 
III .29 .73 —.24 -.18 12 — 84 
IV 04 29 89 - .03 .20 - .05 
V 38 — 54 07 .74 43 -.15 
VI 27 .00 - 30 — .63 79 .28 








tors for these scores were found to fit well in the simple struc- 
ture already established by the speed and level scores. Table 5 
shows the correlations between the primary factors. 


TABLE 4 
The Rotated Factorial Matrix 














Test Variable A B Cc D E F 
Speed Scores: 

Re eee eT ; -~04 -.13 -.18 06 34 

6 Arith. Reasoning ........... ; 02 54 28 -.06 .22 

7 Common Sense ............ ; — .06 04 10 -.09 63 

8 Same-Opposite ............ : -.08 -.05 -.06 25 62 

9 Disarr. Sentences .......... i 04 02 10 04 43 
10 Number Series ............ a -.05 -.02 09 09 Al 
11 Verbal Analogies .......... ; -.01 06 16 04 46 
DS RIE ooo we5S se dnsecs : .10 05 13 02 36 
13 Disarr. Morphemes : 39 34 -.13 10 36 
14 Letter Grouping ........... : 04 46 12 -.08 34 

Level Scores: 
16 Arith. Reasoning .......... .38 50 26 -08 -.05 -.01 
18 Same-Opposite ............ -.07 05 -.01 34 32 06 
20 Number Series ............ 31 53 -.16 -.14 -.03 -.02 
21 Verbal Analogies ........... 02 14 -.11 30 -.02 -.10 
ee errr 10 51 -.10 -.04 -.05 -.06 
23 Disarr. Morphemes ........ — .08 61 18 -.02 10 —.04 
24 Phrase Completion ........ .28 21 08 -.06 46 -.06 
25 Letter Grouping ........... 2a 38 .00 01 31 —.09 
35 Speed of Reading .......... 02 -.16 .26 38 09 46 
Time-Limit Scores: 

27 Arith. Reasoning .. (2’) ... 43 46 .32 08 -.06 06 
28 Arith. Reasoning .. (4’) ... 50 46 10 -.02 04 -.08 
30 Same-Opposite ... (1’) ... -.08 -.13 -.16 .14 46 46 
32 Number Series ... (24’) .. 48 34 -.13 -.08 -.04 .14 
33 Verbal Analogies .. (2’) ... .30 -.09 05 30 —.04 42 
34 Directions ....... Pp es aD 18 .16 .10 .18 .38 
36 Disarr. Morphemes (8’) ...  _.04 A4 .28 .08 18 Al 


38 Letter Grouping .. (8’) ...  .24 36 17 07 .37 .02 
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TABLE 5 


Correlations between the Primary Factors 














A B Cc D E F 
A 1.000 — 044 — .082 — 035 — .242 .002 
B — .044 1.000 — 422 688 052 635 
Cc — .082 — 422 1.000 - 460 051 - 401 
D — .035 688 - 460 1.000 130 516 
E — .242 .052 051 131 1.000 — .036 
F .002 635 - 401 516 — 036 1.000 





Interpretations of the Factors 


In interpreting the factors we follow the arbitrary rule that 
a projection larger than .30 indicates a significant loading of a 
test on a factor. 

The variables having projections of .30 or greater on factor 
A are ranked below in order of size of projection. Significant 
projections on other factors are also given. 


Projections 





No. Test Variable 
A Other factors 
10 Number Series (speed) .............+. 53 ALF 
28 Arithmetical Reasoning (4’ time-limit) .. 50 46B 
32 Number Series (time-limit) ........... 48 34B 
6 Arithmetical Reasoning (speed) ........ 46 54C 
27 Arithmetical Reasoning (2’ time-limit) .. 43 A6B; .32C 
5 Addition (speed) ........sceeeeeeeeee 38 34F 
12 Directions (speed) ..............-0005 38 36F 
16 Arithmetical Reasoning (level) ........ 38 .50B 
20 Number Series (level) ................ 31 53B 
33 Verbal Analogies (time-limit) ......... 30 30D; 42F 


Most of these tests obviously have to do with simple arith- 
metical computation; factor A, then, appears to be the number 
factor N identified in previous studies. The speed scores of 
Arithmetical Reasoning and Number Series have higher load- 
ings than the corresponding level scores. The interpretation 
may be offered that for college adult subjects this factor refers 
to the speed aspect of computational behavior. The level of 
competence of these subjects is such that accuracy in arithmetic 
plays only an incidental réle, although rapid arithmetical abil- 
ity appears to facilitate the correct performance of the rela- 
tively complicated tasks set in Arithmetical Reasoning and 
Number Series. The presence of Directions (speed) on this 
factor becomes understandable when it is noted that a con- 
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siderable share of the items involve numbers and numerical 
operations. 
Factor B has the following tests: 


B Other 
23 Disarranged Morphemes (level) ....... 61 — 
20 Number Series (level) ...........--005 53 31A 
22 Directions (level) ..........ceeeeecees 51 -_—— 
16 Arithmetical Reasoning (level) ......... 50 38A 
27 Arithmetical Reasoning (2’ time-limit) .. 46 43A; .32C 
28 Arithmetical Reasoning (4’ time-limit) .. 46 .50A; .10C 
36 Disarranged Morphemes (time-limit) ... 44 —— 
13 Disarranged Morphemes (speed) ....... 39 34C; .36F 
25 Letter Grouping (level) ............... 38 31E 
38 Letter Grouping (time-limit) .......... 36 37E 
32 Number Series (time-limit) ........... 34 A8A 


In previous factorial studies tests similar to those represented 
above have been identified as tests of reasoning ability. Of 
the eleven variables listed, ten are directly or indirectly mea- 
sures of level of ability (time-limit scores being regarded as a 
function of both level and speed). In the light of these con- 
siderations, factor B may be identified as a Level of Reasoning 
factor. The present battery is too limited to indicate the rela- 
tion of this factor to the inductive and deductive reasoning 
factors which have been indicated in previous studies. The 
presence of the speed score of Disarranged Morphemes on this 
factor is interesting. In contrast to other tests in this battery, 
Disarranged Morphemes is of such a nature that it is almost 
impossible for a subject to be satisfied with an incorrect answer; 
the subject either solves an item correctly or is forced to skip it. 
Consequently, speed of performance in this task would be 
almost perfectly related to ability to answer the items if the 
subjects did not differ in their willingness to skip items. Be- 
cause of this inherent connection between speed and level 
aspects of performance on the Disarranged Morphemes test, it 
is not surprising to find the speed score present on the Level of 
Reasoning factor. Parenthetically, we may say that there are 
several subtle problems in this area which this study has not 
been designed to handle. For example, one would like to know 
how the speed-level relationship varies with the difficulty of the 
task and whether the relationship in the case of multiple-choice 
tests is essentially different from that in the case of tests where 
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the subject is forced by the nature of the test to answer cor- 
rectly or not at all. 
Factor C is represented by the following test variables: 


Cc Other 
6 Arithmetical Reasoning (speed) ....... 54 6A 
14 Letter Grouping (speed) .............. 46 34F 
13 Disarranged Morphemes (speed) ...... 34 39B; .36F 
27 Arithmetical Reasoning (2’ time-limit) .. 32 43A; 46B 


The tests represented here are reasoning tests also found in 
factor B. Factor C, however, is constituted by measures of 
speed. Level is not independently represented at all. Factor 
C may hence be regarded as a Speed of Reasoning factor. As 
will be shown later by multiple regression techniques, the time- 
limit scores of these reasoning tests are much more heavily 
weighted with level than with speed. It is not surprising that 
only one time-limit score (from the 2’ time-limit on Arithmeti- 
cal Reasoning) appears on factor C. The 4 time-limit score 
on this test has a loading of only .10 on C. 

It is of interest to note from Table 5 that there is an appreci- 
able negative correlation between factors B and C. This proba- 
bly indicates that when other factors are ruled out, those who 
are hasty in performing these reasoning tests are likely to be 
inaccurate. 

Factors D and E lack definition in the present limited bat- 
tery. They are represented by the following variables: 


Factor D; D Other 
35 Speed of Reading ................4. 38 A6F 
18 Same-Opposite (level) ............. 34 32E 
21 Verbal Analogies (level) ............ 30 — 
33 Verbal Analogies (time-limit) ....... 30 30A; .42F 
Factor E: E Other 
24 Phrase Completion ................. 46 —— 
30 Same-Opposite (time-limit) ......... 46 A6F 
38 Letter Grouping (time-limit) ........ 37 36B 
18 Same-Opposite (level) .............. 32 34D 
25 Letter Grouping (level) ............ Je 38B 


Factor D may perhaps be characterized as a verbal reasoning 
factor which emphasizes formal relationships such as those of 
antonymity, genus-species, etc. Factor D is highly correlated 
with factor B, the Level of Reasoning factor. Were it not for 
the presence of both the level and time-limit scores of Letter 
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Grouping on the factor, factor E might readily be interpreted 
as the verbal factor identified in previous studies. 
Factor F is represented by the following variables: 


F Other 
7 Common Sense (speed) ...........4.. 63 —— 
8 Same-Opposite (speed) ............+.. 62 —- 
11 Verbal Analogies (speed) ............. 46 — 
55° BOOCd GE BORN coi vccscccsccecsces 46 38D 
30 Same-Opposite (time-limit) ........... 46 A6E 
9 Disarranged Sentences (speed) ........ 43 — 
33 Verbal Analogies (time-limit) ......... 42 .30A; .30D 
10 Number Series (speed) ............... Al 53A 
34 Directions (time-limit) ............... 38 — 
12 Directions (speed) ...............204- 36 .38A 
13 Disarranged Morphemes (speed) ...... 36 39B; .34C 
5 Addition (speed) ...............0008- 34 38A 
14 Letter Grouping (speed) .............. 34 46C 


Every one of these variables involves either a direct or an indi- 
rect measure of speed. It is also true that with one exception 
every speed score in the battery appears in the above list. Only 
four time-limit scores are absent: those of Arithmetical Rea- 
soning, Number Series, Disarranged Morphemes, and Letter 
Grouping, and in these tests it can be shown that speed con- 
tributes little to the time-limit scores. Hence it may be con- 
cluded that factor F is a general speed factor involving rate of 
work in performance of tasks of the sort found in intelligence 
tests. The factor is similar to a general speed factor found in 
some of Holzinger’s studies (5,6). The content of a test does 
not seem to play any réle in determining the loading of its speed 
score on factor F, since tests of verbal, numerical, and reasoning 
abilities all appear in the above list. No definite conclusions 
can be drawn from the present data, however, as to whether this 
factor extends to both easy and difficult tasks. 

The presence of Speed of Reading on factor F might lead 
one to suspect that speed of reading is fundamentally involved 
in this factor. However, some of the tests whose speed scores 
measure the factor (e.g., Addition, Number Series, and Same- 
Opposites) do not have items containing connected text-mate- 
rial where a speed of reading factor could be expected to oper- 
ate. It appears, therefore, that an individual’s reading speed 
is partly a function of some more general speed factor. 
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Multiple-Correlation Analysis 
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Although it is believed that factorial techniques provide a 
more complete and concise analysis of the data, multiple-corre- 
lation techniques can be used to evaluate the independent réles 
of speed and level in determining the variance of time-limit 
scores. Table 6 presents such an analysis for all tests in the 


TABLE 6 


Beta-Coefficients and Multiple Correlations in the Prediction of Time-Limit 
Scores (T) from Speed (S) and Level (L) Scores 




















Relative 
Zero-order 7. 
Ciotelem Comteaeone 
Test ba Rr-st 
Speed | Level 
ts TTL TsL Brs us Bri a 
Alpha Examination: 
Oe Ee ROR Re Sanaa 809 .344 .220 .770 174 826 
2. Arith. Reas. (2’) ....... 482 .797 .439 165 .725 811 
ee re SS sce ce 323.712 .439 012 .706 711 
3. Common Sense ......... 830 .296 .187 804 146 843 
4. Same-Opposites ......... 734 616 .327 596 421 835 
5. Disarranged Sentences .... 518 308 698 .303 .842 
6. Number Series .......... 673.766 .485 394 575 .840 
7. Verbal Analogies ........ 831 .392 .338 .788 125 .840 
8. Directions ......ce.cee0- 649 .566 .415 .500 358 .724 
Disarranged Morphemes ...... 596 .782 .564 .227 653 804 
Letter Grouping ............. 451.726 .137 358 677 .808 














battery for which speed, level, and time-limit scores were ob- 
tained. The beta-coefficients indicate the relative contribu- 
tions made by speed and level in predicting time-limit scores. 
In some tests, such as Arithmetical Reasoning, the time-limit 
scores are chiefly a function of the level of item difficulty that 
can be mastered by the subject, while in other tests, such as 
Common Sense, the time-limit scores are primarily measures of 
the subject’s rate of work. In still other tests, such as Same- 
Opposite, speed and level are about equally weighted in the 
time-limit score. These relationships depend to some extent 
on the particular time-limits which had been set. 

Even where the correlation between time-limit and level 
scores is fairly high the contribution of an independent speed 
component to the time-limit score is sometimes fairly large 


(e.g., in the case of Letter Grouping). The multiple correla- 
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tions in Table 6 are the correlations obtained in the prediction 
of time-limit scores from level and speed scores. These multi- 
ple correlations are in some cases considerably higher than the 
corresponding zero-order coefficients. Nevertheless, there re- 
mains in each case a certain amount of specific (unpredicted ) 
variance in the time-limit score which would militate against 
the prediction of level scores, for example, from a weighted 
combination of speed and time-limit scores. 


Summary 


A number of relatively simple group mental tests were 
administered to 91 college students in such a way as to yield 
three types of score: speed, level, and time-limit. Speed scores 
represented the time required to attempt every item once; level 
scores represented the number of items correctly answered in 
unlimited time; and time-limit scores were the number of items 
correctly answered in a prescribed time-limit. Factor analysis 
revealed that in all cases speed scores were linearly independent 
of level scores and that time-limit scores could be represented 
as factorially complex measures having loadings on both speed 
and level dimensions of ability. Of the factors which were 
identified several were similar to verbal, numerical, and reason- 
ing factors isolated in previous factorial studies. In the domain 
of reasoning ability both level and speed factors were identified. 
A general speed factor involving nearly all of the speed scores 
was found. It is concluded that because of their factorial com- 
plexity, time-limit scores should be used with considerable 
caution both in factorial studies and in studies involving the 
prediction of criteria. 
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THE USE OF AN OBJECTIVE TEST IN PREDICTING 
RHETORIC GRADES 


IRWIN A. BERG, GRAHAM JOHNSON, anp ROBERT P. LARSEN 


University of Illinois 


WHILE a passing grade in rhetoric or in an equivalent course 
is required of freshmen in virtually all colleges and universities, 
many institutions are exempting from the course those students 
who pass a proficiency examination. In addition, such pro- 
ficiency examinations are sometimes used to determine whether 
a student should be admitted to the regular course in rhetoric 
or assigned to a special non-credit rhetoric class. 

At the University of Illinois all entering freshmen are re- 
quired to take a rhetoric placement examination. Those stu- 
dents whose performance is high are granted credit in Rhetoric 
1 without taking the course. Those whose test performance is 
low are required to take Rhetoric 0, a non-credit course. Pro- 
vision is made for early transfer to Rhetoric 1 of any Rhetoric 0 
students whose classroom work proves to be at the level found 
in Rhetoric 1. All other students are entered in Rhetoric 1, the 
usual college course. In addition, a recent action of the Uni- 
versity’s Board of Trustees makes reasonable proficiency in 
written English a graduation requirement. Students earning 
grades of “C” or “D” in Rhetoric 2 are required to pass a special 
examination or to pass a third course in Rhetoric before being 
granted a bachelor’s degree. 

The Rhetoric Placement Examinations at the time of this 
study in October, 1943, consisted of an objective test’ and an 
impromptu theme written in the examination room. The 
actual decision as to whether students were assigned to Rheto- 
ric 0 or Rhetoric 1 or passed for proficiency was made on the 
basis of the quality of the impromptu theme. Two rhetoric 


1 Cooperative Test A: Mechanics of Expression, Form Q. 
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staff members individually evaluated each theme for allocation 
to one of the three groups. If the two instructors disagreed, 
the theme was given to a third instructor who made the final 
decision. The third instructor could consult the objective test 
score in such contested cases when the objective test results 
were available. The objective tests were scored by an Inter- 
national Business Machines electrical scoring machine. 

The object of this study is to make inquiry into the useful- 
ness of an objective test in predicting rhetoric achievement in 
coliege. Upon initial examination it may appear that such an 
inquiry could not prove fruitful since the rhetoric course grade 
is mainly determined by grades on what is often thought of as 
subjectively evaluated compositions. Thus a sample compo- 
sition graded in the same manner might be considered an emi- 
nently more suitable predictive instrument. As Kelley and 
Roberts put it: 


“We have found that ability to detect and correct errors 

in exercises is not always accompanied by the ability to 

avoid similar or worse errors in original composition, and 

that, conversely, students really proficient in composition 
may have indifferent success on a problem-solving type of 
test. We hold firmly the conviction that a student’s degree 

of proficiency in writing can be determined only by a demon- 

stration of that proficiency, in writing.’ 

Yet the advantages of rapid scoring which could be done by 
persons who are not necessarily rhetoric instructors, together 
with the advantages of objectivity of score, would make the 
use of a suitable objective test an extremely practical measur- 
ing tool. 

Two groups were used in this study. Group 1 numbered 
372 students and group 2 numbered 166. Both groups were 
composed of freshmen entering the College of Liberal Arts and 
Sciences in the fall of 1943. The groups were not at first com- 
bined because each was tested in a different room and by differ- 
ent examiners, although the day and hour were the same for 
both groups. As will be noted later this precaution was un- 
necessary; consequently only in Table 2 are the groups sepa- 


2 Kelley, Cornelia and Roberts, Charles. “Rhetoric Proficiency Tests.” Illinois 
English Bulletin, XXXI (1944), 2. 
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rated. In analyzing the data from these groups Pearson prod- 
uct-moment correlations and other simple statistical measures 
were used. The grades of Rhetoric 0 students were lowered one 
grade point (i.e., from B to C) and the grades of students who 
were passed for proficiency in rhetoric were tabulated as B. 
This was in accordance with the recommendation made by 
members of the rhetoric staff. 


Results 


TABLE 1 
Mechanics of Expression Test Scores in Relation to Grades in Rhetoric 

















Rhet. i 
ce N Mean S.D. CR. 
Ee 25 142.7 9.8 
Frese aaa 154 1176 174 10.5 (A & B) 
 pipertentie 216 98.4 20.2 
ee bees as: 43 83.1 19.9 4.6 (C & D) 
Te 8 73.3 29.5 9(D& E) 
All Grades Rhet. 1 .. 446 105.4 24.2 
Bicsoric @ ...;.-.. 29 69.4 163 3.2 (D & Rhet. 0) 
Pass Proficiency .... 62 134.5 15.3 3.0 (A & Prof.) 
TABLE 2 


Correlation of Grades with Test Scores 














Group 1 Group 2 
N r N r 
Mech. Exp. Scores and all Rhetoric grades* .. 372 .683 166 696 
Mech. Exp. Scores and Rhet. 1 grades onlyt .. 350 639 159 634 
Mech. Exp. Scores and Rhet. 1 grades without 
BIOLCIENCY QEOUDS Sedcscecs seeded ccvcces 313 618 133 607 
Mech. Exp. Scores and Grade Point Average 
BOR SUL COUSSESD occ ee cihecicr cere secceeee 342 541 162 532 
Mech. Exp. Scores and A.C.E. Total Scores|| .. 342 659 162 657 
A.C.E. Total Scores and Rhetoric 1 grades§ .. 302 442 139 465 





* Grades of students “passed for proficiency” in Rhetoric 1 are taken as “B.” 
Rhetoric 0 grades are lowered one grade point. Mech. Exp. is the abbreviation for 
Cooperative Test A: Mechanics of Expression, Form Q. 

+ Rhetoric 0 grades are not included. 

t Rhetoric 0 and “passed for proficiency” grades not included. 

§ Only those students who earned grades in courses totaling 12 or more semester 
hours are included. 

|| A.C.E. refers to the American Council on Education Psychological Examina- 
tion, 1940 edition. 

q Rhetoric 0 grades are not included. Some Rhetoric 1 students did not take 
the A.C.E. examination. 
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TABLE 3 


Types of Errors Checked in Early Freshman Compositions of 147 
Rhetoric I Students* 











Total No.of § sien Uieiee 
otal No. o' tudents Making 
Type of Error Violations One or More 
Errors 

1. Grammar (ie., sentence fragment, incorrect 

ce | a a ee a 364 33.9 
2. Mechanics (ie., capitalization, italics) .... 298 44.1 
3. Punctuation (ie., superfluous comma, 

hyphen in compounds) ................. 1153 50.2 
BI Gh Sob cuacra sac csbee>ee<s oe cee 703 86.0 
5. Diction (i.e., exactness, wordiness, faulty 

END oirr es sw Uh NG ic Saas eee sv ou wees 1184 75.9 
6. Coherence (i.e., word order, dangling modi- 

ee ee en Se ee 636 39.7 





* Summarized from Johnson, W. G. and Mathews, E.G. Errors most frequently 
checked in early freshman compositions. Jllinois English Bulletin, XXXI (1944), 
1-8. 


Discussion 


The curious aspect of these data is not that the objective 
test used in this study is a good predictor of achievement in 
Rhetoric 1. Admittedly, correlations above .60 between course 
grades and pre-test scores are not common. The more perti- 
nent question is why is the correlation so high? Rhetoric 1 is 
a course in which the final grade is largely determined by grades 
earned on written themes in which the instructor’s evaluation 
includes what may be considered subjective aspects such as 
triteness, wordiness, or lack of logic. ‘The purely objective test, 
on the other hand, measures only basic skills in grammar, 
spelling, punctuation, and capitalization. There can be little 
doubt that the objective test does bear a clear relation to grades 
earned. In Table 1 it will be seen that there is a progressive 
decrease in the mean objective test scores by grades from “A” 
through “E.” Rhetoric 0, composed of students deemed too 
poorly prepared to enter Rhetoric 1, has the lowest mean of all 
categories. The group “passed for proficiency” has a mean 
score between those of the “A” and “B” grade students. Sev- 
eral Rhetoric 1 instructors agreed beforehand that the pro- 
ficiency students would probably fall at this level. The critical 
ratios of the differences between the test means of most grade 
categories indicate that the differences are highly significant. 
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An inference may be drawn from the data in Table 3 which 
partially explains why the predictive value of an objective test 
should be high for a course the content of which is ordinarily 
thought of as being subjectively graded. It will be noted that 
the majority of the errors of early freshman compositions were 
largely objectively ascertained errors, i.e., errors of spelling, 
punctuation, mechanics, and grammar. Significantly, it is this 
type of error which the objective test measures. In practice, 
then, the majority of the errors checked are detected objec- 
tively. 

The weight, or value, rhetoric instructors assign to purely 
objective errors is probably quite high. A hypothetical case 
may demonstrate the importance of the value assigned to a 
given error. Suppose that two students each have exactly the 
same number of errors checked on their themes. The first stu- 
dent makes errors only of exactness and wordiness while the 
errors of the second student are errors in spelling, such as 
alright, and a series of errors such as between him and I, he 
come around the corner, etc. It is likely that the rhetoric 
instructor would consider the latter errors as abominations not 
to be lightly dismissed. Presumably the grade of the first stu- 
dent would be significantly higher than that of the second 
although the number of errors was the same in either case. 

In other instances the instructor may grade compositions on 
a purely objective basis because he can do little else. Most 
of the themes may, at times, approximate each other in errors 
of diction, coherence, and organization while the variation be- 
tween compositions in spelling, punctuation, etc., is marked. 
Also, the instructor may be influenced by the fact that such 
errors are more easily detected and are less likely to be con- 
tested by the student. 

Another interpretation of these data might propose that 
there exists a parallel development of the mastery of English 
mechanics and of effectiveness in expression. Thus, skill in 
mechanics would tend strongly to be associated with variety 
of sentence structure, freshness of treatment, and superior 
organization. Conversely, lack of mechanical accuracy would 
tend strongly to be associated with a less effective presentation 
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of material.. It might then be said that rhetoric instructors are 
accurate in evaluating both the mechanics and the effectiveness 
of composition. Since both aspects of writing are positively 
correlated, a false impression is given that the purely objective 
errors are emphasized. This interpretation would be strength- 
ened by the fact that early freshman compositions, such as 
those recorded in Table 3, are very carefully scored for mechani- 
cal errors. Because most students would presumably improve 
in mechanics, the rhetoric instructor could place greater empha- 
sis, when scoring later compositions, upon interest, originality, 
and freshness of treatment. This explanation would probably 
be favored by rhetoric staff members. 

But it must be emphasized that, while such parallel develop- 
ment may exist to some extent, instructors would agree closely 
on scoring for mechanical accuracy but not on scoring for origi- 
nality or superiority in organization. Yet the correlation be- 
tween a test of basic mechanical skills given at the beginning 
of the semester with final rhetoric course grades at the end of 
the semester is almost .70. It would seem that if the more or 
less subjectively scored aspects of compositions are given much 
weight the correlation should be considerably lower, since varia- 
bility in scoring is greater. 

In general, it may be said that the high-school preparation 
of many students is inadequate insofar as mastery of basic skills 
in the mechanics of expression is concerned. It may be that a 
universal college admission requirement of four years of high- 
school English would improve the situation. Perhaps a re- 
quired level of proficiency on a rhetoric test could be made part 
of the admission procedure. Such measures could be adopted 
only if it were agreed that the main function of a college rheto- 
ric course is practice in English composition, not drill in punctu- 
ating such compositions. It is probable that the predictive 
value of an objective test in rhetoric will remain high as long 
as many students enter college with inadequate grounding in 
English mechanics. The correlation of such tests with grades 
should become progressively lower as students enter rhetoric 
classes with more thorough preparation in English. 

This lack of thorough training in so-called “drill subjects” 
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may be reflected in other areas. It is not uncommon to en- 
counter students, for example, who do not know the multipli- 
cation table. Occasionally one may discover a college student 
who has not learned the exact order of the letters of the alpha- 
bet. Whether or not mastery of “drill subjects” should be 
expected of entering college students cannot be discussed here. 
But it seems clear, in the case of mechanics of English at least, 
that such lack of proficiency does exist. 


Summary 


Data from 538 University of Illinois Liberal Arts College 
freshmen in Rhetoric 1 were analyzed. It was found that the 
scores of an objective test in the mechanics of expression corre- 
lated as high as .69 with final grades in rhetoric. Critical ratios 
of the differences in the mean objective test scores of students 
earning final grades of “A,” “B,” “C,” etc., were significant. 
The question was raised as to why course grades which were 
presumably determined subjectively should correlate highly 
with scores on an objective test. The explanation advanced 
was that the preparation of many students was such that the 
rhetoric instructor was forced to grade largely on the basis of 
errors in mechanics. It was further suggested that instructors 
probably view more sternly objectively ascertained errors such 
as he don’t and alright than the more subjectively determined 
errors such as triteness and freshness of treatment. Also, it 
was considered possible that the instructor may have been in- 
fluenced when grading English compositions by the fact that 
purely objective errors are more easily detected and that the 
student is less likely to contest a grade based upon objective 
errors. The possibility that a parallel development exists 
between mastery of English mechanics and effectiveness of ex- 
pression was considered but judged to be inadequate as an 
explanation. 




















A QUICK GRAPHIC METHOD FOR PRODUCT 
MOMENT ‘“r’ 


WILLIAM LEROY JENKINS 
Lehigh University? 


THE product-moment coefficient of correlation can be deter- 
mined graphically in a fraction of the time required to compute 
it mathematically. With large samples the two methods give 
virtually identical results. With small samples graphically 
determined coefficients appear to be about as representative of 
the relationship in the total population from which the sample 
was drawn. 

The graphic method can be applied directly to raw data 
without grouping into class intervals. The correctness of the 
solution can be readily checked by inspection. 

The method depends on the following relation between the 
coefficient of correlation (r) and a ratio (J) which can be deter- 


mined graphically: 
— {ite 
sid J l-r 


Procedure (See Figure 1) 





0. Plot the scattergraph directly from the raw data without 
grouping. 

1. Move a straightedge from the top of the scattergraph, 
keeping it parallel to the x-axis until 16% of the plotted points 
show above the straightedge. Through the latest point to 
appear draw a line parallel to the x-axis. 

2. Move a straightedge from the right side of the scatter- 
graph, keeping it parallel to the y-axis, until 16% of the plotted 





1QOn leave until January 1, 1946, with Columbia University, Division of War 
Research, Submarine Training Section, Box 34, Submarine Base, New London, 
Connecticut. 
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points show to the right of the straightedge. Through the latest 
point to appear draw a line parallel to the y-axis. 

3. Move a straightedge from the bottom of the scatter- 
graph, keeping it parallel to the x-axis, until 16% of the plotted 
points show below the straightedge. Through the latest point 
to appear draw a line parallel to the x-axis. 
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4. Move a straightedge from the left side of the scatter- 
graph, keeping it parallel to the y-axis, until 16% of the plotted 
points show to the left of the straightedge. Through the latest 
point to appear draw a line parallel to the y-axis. 

5, 6. Draw the diagonals of the rectangle formed by lines 
1, 2, 3, 4. 

7. Move a straightedge from the upper right corner of the 
scattergraph, keeping it parallel to diagonal 5, until 8% of the 
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plotted points show beyond the straightedge. Through the 
latest point to appear draw a line parallel to diagonal 5. Move 
the straightedge in until 16% of the plotted points show beyond 
the straightedge. Through the latest point to appear draw a 
line parallel to diagonal 5. 

8. Move a straightedge from the lower right corner of the 
scattergraph, keeping it parallel to diagonal 6, until 8% of 
the plotted points show beyond the straightedge. Through the 
latest point to appear draw a line parallel to diagonal 6. Move 
the straightedge in until 16% of the plotted points show beyond 
the straightedge. Through the latest point to appear draw a 
line parallel to diagonal 6. 

9. Move a straightedge from the lower left corner of the 
scattergraph, keeping it parallel to diagonal 5, until 8% of 
the plotted points show beyond the straightedge. Through 
the latest point to appear draw a line parallel to diagonal 5. 
Move the straightedge in until 16% of the plotted points show 
beyond the straightedge. Through the latest point to appear 
draw a line parallel to diagonal 5. 

10. Move a straightedge from the upper left corner of the 
scattergraph, keeping it parallel to diagonal 6, until 8% of 
the plotted points show beyond the straightedge. Through 
the latest point to appear draw a line parallel to diagonal 6. 
Move the straightedge in until 16% of the plotted points show 


TABLE 1 


Conversion of J tor 











J r z* J r z* J r z* J r a* 
1.0 .000 .000 2.3 682 .833 3.6 .857 1.281 49 .920 1.589 
1.1 .095 .095 2.4 .704 .876 3.7 .864 1.308 5.0 .923 1.609 
12 .180 .182 2.5 .725 916 3.8 .870 1.335 5.1 .926 1.629 
13° .256.:.262 2.6 .743 .956 3.9 .877 1.361 5.2 .929 1.649 
14 .324 .337 2.7 .759 .993 4.0 .883 1.386 5.3 931 1.668 
1.5 .385 .406 2.8 .774 1.030 4.1 .888 1.411 5.4 .934 1.686 
16 .438 .470 2.9 .788 1.065 4.2 .893 1.435 5.5 .936 1.705 
17 .486 .531 3.0 .800 1.099 4.3 898 1.459 5.6 .938 1.723 
18 .529 .588 3.1 .811 1.131 44 902 1.482 5.7. 940 1.741 
19 .566 .642 3.2 .822 1.163 45 .906 1.504 5.8 .942 1.758 
2.0 .600 .693 3.3 .832 1.194 46 910 1.526 5.9 .944 1.775 
2.1 .630 .742 3.4 .841 1.224 4.7 913 1.548 6.0 .946 1.792 
2.2 658 .789 3.5 .849 1.253 48 917 1.569 





* See R. A. Fisher’s “Statistical Methods for Research Workers,” pp. 202-215. 
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beyond the straightedge. Through the latest point to appear 
draw a line parallel to diagonal 6. 

11. With a millimeter scale measure the sides of the paral- 
lelogram formed by lines 7’, 8’, 9’, 10’. Divide the longer by the 
shorter side to get the ratio J’. 

12. With a millimeter scale measure the sides of the paral- 
lelogram formed by lines 7”, 8”, 9”, 10”.. Divide the longer by 
the shorter side to get the ratio J”. 

13. Take the mean of J’ and J” to get J. Interpolate in 
Table 1 to get the value of r or use the equation: 

J?- 
f= yF+1 


Mathematical Proof (See Figure 2) 


Commander H. S. Sharp, USCG, has been good enough to supply this mathe- 
matical support for what was originally a purely empirical method. 
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Lines 5 and 6 are located by estimating 202 and 2o0y (68% of the points) and 


drawing the diagonals. Therefore, tan a=2, The ratio J’ is similarly obtained by 


2 
estimating the standard deviations (0s, 66) about lines 5 and 6. Therefore, J’?= Si 
vst sa?=-1s («+22y) sin’? @ 

6 N N 


at 2 Exy , oe Ey", c=) of _ 
N N oy N oy’ / o2* + oy? 
22xy os p92) Oy" 


6, / Oa" + a7 








—_— 





GRAPHIC METHOD FOR PRODUCT MOMENT ‘Yr’ 441 








N N Oz 
2 6270," 
nnn 28th 
yan ltt meee iT, POE owt. 
“oe? l-sr l-r ~ J7+1 


(R. A. Fisher’s) Z=4 loge tht =loge J’ 


(The use of J” (8%) has been added to the original method because the mean 
of J’ and J” empirically gives better results than J’ alone.) 


Empirical Tests 


A symmetrical correlation array of 1000 pairs was developed 
from two normal distributions. For this, the computed and 
graphic coefficients were virtually identical. 


Computed r _ .764 
Graphic r .763 


The pairs were thoroughly shuffled and dealt out into packs 
of 50 each. Each pack of 50 pairs was plotted on a separate 
scattergraph and the coefficient of correlation determined 
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graphically and by computation. The pairs were again thor- 
oughly shuffled and dealt out into packs of 50 each, and the 
graphic and computed correlations determined for each set. 
The same process was repeated to obtain 20 sets of 100 each. 
Figure 3 shows the frequency distributions of computed and 
graphic coefficients for the forty sets of 50 pairs and the twenty 
sets of 100 pairs. Except for one low erratic value the graphic 
and the computed distributions are comparable. Table 2 shows 


TABLE 2 
Standard Deviations with One Erratic Value Omitted 











50 pairs 100 pairs 
| Ie le ws oa anes ae 059 .042 
ee ree 061 .040 
SER hc san eedibs ceketes sea .064 .040 





the standard deviations measured from the ‘r’ of the whole 
population with one low erratic value omitted. The standard 
deviations for graphic and computed values are virtually 
identical. 

In the original use of the graphic method only the lines 
formed by counting in 16% toward the diagonals were used. 
In these tests 8% and 24% were also tried. Table 3 shows com- 


TABLE 3 
Comparative Standard Deviations 











50 pairs 100 pairs 
Mean 8% and 16% .......... .064 .040 
1 Ae OSS a ae 079 .050 
Mean 8%, 16%, and 24% .... 067 ae 





parative standard deviations for 16% alone, for the mean of 
8% and 16% ratios, and for the mean of 8%, 16%, and 24% 
ratios. The described method (mean of 8% and 16% ratios) 
is clearly better than 16% alone and is as good as the mean of 
8%, 16%, and 24%, which requires much additional work. 
The standard deviation of the differences between graphic 
and computed values is .037 for the 50-pair coefficients and also 
for the 100-pair coefficients. This is less than the standard 
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deviations of the coefficients themselves, showing that graphic 
and computed values were very much alike in the arrays used 
in these tests. 

There is one type of scattergraph, however, where the 
graphic coefficient is bound to be quite different. That is the 
scattergraph having a generally symmetrical array but with a 
few wildly deviant cases off in one corner. In such a case, the 
computed coefficient may be greatly disturbed by the few devi- 
ant cases. The graphic coefficient will not be affected. In this 
instance, the graphic coefficient probably yields a better mea- 
sure of the true relationship in the whole population from which 
the sample was drawn. 
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MEASUREMENT NEWS* 


Papers relating to the field of measurement constituted more than 
half of those presented at the meeting of the Military Division of the 
American Psychological Association at the University of Maryland 
on November 27 and 28. The program included the following papers 
directly concerned with the field: 

“Equivalences between Army and Civilian Tests”, C. P. Sparks. 

“The Army General Classification Test”, Staff, Personnel Re- 
search Section, Classification and Replacement Branch, AGO, read 
by R. H. Bittner. 

“Correction for Restricted Range”, E. G. Brundage. 

“The Objective Measurement of Flying Skill”, A. C. Tucker. 

“The Selection of Marine Officer Candidate”, S. B. Williams. 

“Surveys of Opinions on Training”, C. R. Pace and D. L. Gibson. 

“Scale and Intensity Analysis for Attitude, Opinion and Achieve- 
ment”, Louis Guttman. 

“The Form of Items and the Distributions of False Positive Scores 
on a Neurotic Inventory”, W. A. Owens. 

“War Weariness and Morale in Air Groups”, J. G. Darley. 

“Morale Surveys in the Army”, Carl Hovland. 

“The Criterion in Army Personnel Research”, Staff, Personnel 
Research Section, Classification and Replacement Branch, AGO, read 
by E. D. Sisson. 

“The Nominating Technique”, C. L. Vaughn. 

“The Use of Order-of-Merit Rankings as a Criterion of Shipboard 
Performance of Enlisted Personnel”, H. P. Bechtoldt, D. B. Stuit, 
and J. W. Haucker. 

“Criteria of Air Crew Proficiency in Operational Training”, L. B. 
Ward. 

“Seleciion of Army Officers”, M. W. Richardson. 

“The Significance of Case History Items as Detectors of Potential 
Naval Delinquent”, H. F. Hunt and Nathan Goldman. 

“Assessment of the Whole Person: Procedures used in Testing the 
Suitability of 5,500 OSS Recruits”, H. A. Murray. 

“Test Procedures for the Psychiatric Screening of Naval Per- 
sonnel: Some Problems in Method”, Milton Wexler. 

“Development of an Interview for Selection Purposes”, Staff, 
Personnel Research Section, Classification and Replacement Branch, 
AGO, read by E. A. Rundquist. 


* Readers are invited to send notes for this section to the Editor, EpucaTIONAL 
AND PsycHotocicaL MeEasurEMENT, 917 Fifteenth Street, N.W., Washington 5, 
Dc. 
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The papers presented at the meeting and the discussion concern- 
ing them are to be published by the University of Maryland. Copies 
of the proceedings may be ordered from Professor G. A. Kelley, De- 
partment of Psychology, The University of Maryland, College Park, 


Maryland. 





The preparation of a comprehensive handbook on educational 
measurement has been undertaken under the sponsorship of the 
Committee on Measurement and Guidance of The American Council 
on Education. The board of editors is composed of W. W. Cook, 
— C. Flanagan, E. F. Lindquist, chairman, Irving Lorge, T. R. 

cConnell, Philip J. Rulon, Donald J. Shank, John M. Stalnaker, 
Ralph Tyler, Kenneth Vaughn, and Ben D. Wood, with the chairman 
serving as editor-in-chief. Each of the twenty chapters is to be pre- 
pared by a specialist in the particular field covered, assisted by 
several collaborators. A production schedule has been outlined call- 
ing for sending final copy to the printers July 1, 1947. 





An Evaluation Service Center has been established at Syracuse 
University, with Professor Maurice E. Troyer as director. The pur- 
poses of the Center are (1) to help faculty members in their effort to 
improve appraisal of student progress, (2) to assist faculty members 
in constructing tests, (3) to make analyses of tests, (4) to keep and 
make available to staff members an up-to-date file of sample published 
and unpublished tests in the various subject areas, (5) to keep up- 
to-date and make available to staff members a library of references 
on problems and procedures of measurement and evaluation in higher 
education, (6) to conduct seminars in problems of evaluation for 
staff members, departmental assistants and scholars interested in 
systematic and comprehensive study of test construction and inter- 
pretation, (7) to assist with research, (8) to encourage study and 
publication, by faculty members, of new and better methods of 
appraisal and instruction, and (9) to serve, through the staff of the 
Center and other members of the University Faculty on a fee basis, 
as consultant to other colleges and universities in the area on prob- 
lems of appraisal and instruction. 





The firm of Richardson, Bellows, Henry and Co., Inc., has been 
established at 56 Beacon Street, New York, to conduct surveys and 
research on problems of selection, placement, training, and employee 
morale for business and industrial concerns. It will do job analyses 
from the standpoint of qualification requirements and training needs; 
design application blanks, recommendation blanks, controlled inter- 
viewing procedures, aptitude tests, information tests, interest and 
personality tests, and training outlines and manuals; develop merit 
rating systems and systems for combining scores (by means of the 
usual multiple-correlation validation studies); make clinical ap- 
praisals of executive personnel; conduct attitude surveys; design 
personnel record systems and personnel statistical reporting systems; 














7, =. ae” 


a eS eh lle 








MEASUREMENT NEWS 447 


and make over-all surveys of personnel programs. The member- 
ship consists of Roger M. Bellows, Francis F. Bradshaw ( President), 
Edward E. Cureton (Secretary-Treasurer ), Douglas H. Fryer, Edwin 
R. Henry, Hermann H. Remmers, Marion W. Richardson (Chairman 
of the Board of Directors), Carroll L. Shartle, and Robert J. Wherry 
(Vice President). 


A counseling center has been established at the University of 
Chicago under the jurisdiction of the Dean of Students, Carl R. 
Rogers, to provide the services enumerated below. The volume of 
these services will be governed in accordance with the best interests of 
a sound program of professional education and research, carried out 
in cooperation with various interested departments and schools. 





Services 


1. To provide adjustment counseling to students, veterans, in- 
dustrial workers, and other individuals and groups. 

To provide a diagnostic service, using tests, interviews, and 
other techniques. 

To refer individuals to appropriate University services and 
agencies. 

To assist in the coordination of specialized counseling functions 
on the campus. 

To promote the development of in-service training programs 
with those groups interested in improving their counseling 


skills. 


William W. Blaesser, Treasurer of the American College Per- 
sonnel Association and formerly of the University of Wisconsin, is 
now Assistant Dean of Students and Director of the Counseling 
Center at the University of Chicago. 
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Lieutenant Colonel J. P. Guilford has returned to the position of 
Professor of Psychology at the University of Southern California 
after almost four years with the Army Air Forces. In his last assign- 
ment he was Chief of the Department of Records and Analysis of the 
AAF School of Aviation Medicine, Randolph Field. The Depart- 
ment of Records and Analysis fell heir to the accumulated answer 
sheets, card records and data from all the examining and research 
units in aircrew classification. Besides turning out a number of final 
reports the organization completed the writing of a 29-chapter volume 
on Printed Atrcrew Classification Tests, one of 15 volumes which are 
scheduled to be written to report the results of the AAF Psychological 
Research Program. 


Dr. Harold C. Taylor has been appointed director of the W. E. 
Upjohn Institute for Community Research of Kalamazoo, Michigan. 
One of the major objectives of the Institute is to investigate the 
“suitability of opportunity for gainful employment: its relationship 
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to the aptitudes, skills and interests of people; and its relationship to 
the satisfactions, monetary and otherwise, which people desire to 
obtain from their jobs.” Dr. Charles C. Gibbons is leaving his posi- 
tion as Director of Personnel Research of the Owens-Illinois Glass 
Company to become Assistant Director of the Institute. 


Professor John M. Stalnaker has left his position with the College 
Entrance Examination Board and Princeton University to become 
Dean of Students and Professor of Psychology at Stanford Univer- 
sity. Mr. Henry Chauncey has been appointed Associate Secretary 
and Professor Harold O. Gulliksen has been appointed Research 
Secretary of the College Entrance Examination Board. 


Lieutenant Commander D. D. Feder has returned to his civilian 
position as Executive Officer and Supervisor of the Illinois State Civil 
Service Commission. His last billet in the Navy was that of Officer- 
in-Charge, Radio Material Unit, Test and Research Division, Bureau 
of Personnel. 


Lieutenant Colonel Paul Horst has left the Army to return to the 
Procter and Gamble Company, where he is now Director of Personnel 
Research. 
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Berdie, Lt. R. F. “Range of Interests.” Journal of Applied Psychology, XXIX 

(1945), 268-281. 

An interest scale based on 22 items was found to differentiate clearly between 
recruits who could be expected to adjust to military training and those who could 
not. As against the orally presented list, the printed list proved of greater conveni- 
ence and objectivity. Age and educational factors must be taken into account in any 
analysis of the results. Supplementing the psychiatric screening technique and the 
interview, this range of interests test offers a satisfactory method of predicting adjust- 
ment. Vernon S. Tracht. 








Challman, Robert C. “The Validity of the Harrower-Erickson Multiple Choice Test 

as a Screening Device.” Journal of Psychology, XX (1945), 41-48. 

The Harrower-Erickson Multiple Choice Test was designed as a selective device 
for military use. The procedure consists of offering subjects 10 choices for each of 
the 10 Rorschach cards. Half of the choices are considered representative of indi- 
viduals suffering from mental abnormalities. If the subject considers none of the 
choices applicable, he is advised to submit an alternate. The critical score is based 
on 4, 5, or 6 abnormal responses, depending upon the degree of selectivity desired. 
Three methods of scoring were suggested. In Method I all alternates are classed as 
abnormal; in Method II alternates are scored abnormal only when characterized by 
poor form or when bizarre in content; in Method III abnormal and alternate responses 
are weighted. Harrower-Erickson found the procedure valuable as a screening de- 
vice. However, later studies, including the one described in this article, do not indi- 
cate a sufficiently sharp distinction between the responses of the normal and the 
abnormal to warrant the acceptance of the method as more than an auxiliary to be 
used with a personality inventory. Helen Heath. 





Forlano, G. and Kirkpatrick, F. H. “Intelligence and Adjustment Measurements in 
the Selection of Radio Tube Mounters.” Journal of Applied Psychology, XXIX 
(1945), 257-261. 

This study was concerned with the problem of effectiveness of intelligence and 
adjustment tests in bringing about increased worker efficiency in radio tube mount- 
ing jobs. Subjects in the experiment were 20 female tube mounters. Tests used 
were (1) the Otis Self-Administering Test of Mental Ability, Form B; (2) the social 
scale of the Bell Adjustment Inventory; (3) the alienation scale of ‘the Washburne 
Social Adjustment Inventory. The criterion used for the experiment was ratings by 
the supervisor in charge of the group. It is concluded that low intelligence scores 
tend to indicate poorer workers but average scores or above do not discriminate 
between “good” and “fair” workers, while scores in social adjustment do differentiate 
“good” and “fair” workers. A composite of intelligence and personality scores is 
therefore effective in predicting success of new tube mounters. Frances Smith. 





Geil, George A. “A Clinically Useful Abbreviated Wechsler-Bellevue Scale.” Jour- 
nal of Psychology, XX (1945), 101-108. 
Selection of a shortened form of the Bellevue full scale, which would meet 
requirements of the clinical situation for time economy, accuracy of intelligence 


* Edited by Forrest A. Kingsbury. 
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rating, and diagnostic screening capacity, was made on the basis of analysis of test 
records of a group of 250 unselected cases examined by the Wechsler-Bellevue full 
scale at the Medical Center for Federal Prisoners, Springfield, Mo. Mean weighted 
scores for each of the 10 subtests of the scale, and mean total full scale weighted sub- 
test score were determined, and mean total weighted subtest scores for trial combi- 
nations of selected subtests were then computed. Trial combinations were retained 
which showed close alignment of mean total weighted subtest scores with that indi- 
cated for the full scale, and were tested for accuracy by comparison of calculated IQ 
scores on the combinations for each of the 250 cases with the actual full scale IQ 
scores. An abbreviated scale composed of the tests of Comprehension, Similarities, 
Digits, and Block Design, shows a correlation of .966+.003 with the IQ’s of the full 
scale, and is found to be reliable as a screening instrument. Frances Smith. 





Gulliksen, Harold. “The Relation of Item Difficulty and Inter-Item Correlation to 

Test Variance and Reliability.” Psychometrika, X (1945), 79-91. 

Under assumptions that will hold for the usual test situation, it is proved that 
test reliability and variance increase (a) as the average inter-item correlation in- 
creases, and (b) as the variance of the item difficulty distribution decreases. As the 
average item variance increases, the test variance will increase, but the test reliability 
will not be affected. (It is noted that as the average item variance increases, the 
average item difficulty approaches .50). In this development, no account is taken 
of the effect of chance success, or the possible effect on student attitude of different 
item difficulty distributions. In order to maximize the reliability and variance of 
a test, the items should have high intercorrelations, all items should be of the same 
difficulty level, and the level should be as near to 50% as possible. (Courtesy 
Psychometrika.) 





ney Lt. H. M. “Single-Item Tests for Psychometric Screening.” Journal of 
Applied Psychology, XX1X (1945), 262-267. 

This describes how a series of 10 Single-Item Tests, selected from 30 experimen- 
tal items from well-known mental tests by a special scoring technique, were devised 
for use in the Navy’s psychometric screening program. A successful response to 
any one of these tests indicated mental ability above the minimum—M.A. of 11 
years—arbitrarily selected by naval officials. Taking from 1 to 20 minutes per man 
to administer, and not requiring optimal testing conditions, they were standardized 
on 1500 cases; and, in a 3 months’ trial period, exhibited an error in the selection of 
recruits of only 1/10 of 1 per cent. Vernon S. Tracht. 





Hult, Esther. “Study of Achievement in Educational Psychology.” The Journal of 

Experimental Education, XIII (1945), 174-190. 

The purpose of this study was to find the relationship between practice teaching 
success and measures of ability considered prerequisite for this success as determined 
by various tests. The criteria were (1) practice teaching marks, and (2) ratings by 
supervisors. No significant relationships were found between practice teaching marks, 
general knowledge and mental ability. The multiple correlation between the several 
factors and the criterion was not high enough for individual prediction but there was 
a significant relationship between success in teaching and the grade point average. 
Shortly before the end of the semester the practice teachers rated their theory course 
and teacher, and practice course and teacher. Of the two courses, they tended to 
rate their practice course higher; and of the two teachers, their practice teacher 
lower. Howard M. Schuman. 





Johnson, Palmer O. and Tsao, Fei. “Factorial Design and Covariance in the Study 
of Individual Educational Development.” Psychometrika, X (1945), 133- 162. 
This is the report of the application of the principles of factorial design to an 

investigation of individual educational development. The specific type of factorial 

design formulated was a 2X3 X3X3 arrangement, that is, the effect of sex, grade 
location, scholastic standing, and individual order, singly and in all possible combi- 
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nations was studied in relation to educational development as measured by the Jowa 
Tests of Educational Development. An application of the covariance method was 
introduced which resulted in increased precision of this type of experimental design 
by significantly reducing experimental error. The two concomitant measures used 
to increase the sensitiveness of the experiment were initial status of individual devel- 
opment and mental age. Without these statistical controls all main effects and two 
first-order interactions would have been accepted as significant. With their use only 
sex (doubtful), scholastic standing, and individual order demonstrated significant 
effects. The chief beauty of the analysis of variance and covariance as an integral 
part of a self-contained experiment is demonstrated in the complete single analysis 
of the data. The statistical utilization of the experimental results has also been 
developed for purposes of estimation and prediction. The mathematical statistician 
is being continuously required to develop and analyze experimental designs of in- 
creasing complexity since the introduction of the analysis of variance and covariance. 
The mathematical formulation and solution of the problem of this investigation is 
carried out. The methods illustrated and explained in this study, and modifications 
and extensions of them are capable of very wide application. The general principles 
can be used to various degrees and in a number of ways. (Courtesy Psychometrika.) 








Kaitz, Hyman B. “A Note on Reliability.” Psychometrika, X (1945), 127-131. 

A formula for internal consistency reliability is developed within the framework 
of the analysis of variance. The test items are assumed to be homogeneous, but may 
have any weights. Data needed for computation are the student test scores and the 
total number of items answered so as to have the same weight. It is shown that this 
formula reduces to the Kuder-Richardson for item weights of one and zero. Some 
empirical validation is offered. (Courtesy Psychometrika.) 





Martin, Howard G. “The Construction of the Guilford-Martin Inventory of Factors 
G-A-M-I-N.” Journal of Applied Psychology, XXIX (1945), 298-300. 
Factor analyses have isolated five new temperament traits, (G) general pressure 

for overt activity, (A) social ascendancy, (M) masculinity of attitudes and inter- 

ests, (I) self-confidence, (N) lack of nervous tension. Over 300 items, answered 

“Yes,” “No,” or “?” were administered to 250 men and 250 women, all college stu- 

dents between the ages of 19 and 30. Items, shown by factor and item analyses to 

have heavy loadings in a trait, were used on the preliminary scoring key. Four 
hundred tests taken by 200 men and 206 women were scored with this preliminary 
key, and the highest 100 and lowest 100 cases were used as criterion groups for fac- 
tors G,A,I,and N. Factor M was based on scores of the highest 100 males and the 
lowest 100 females. Split-half reliability on the five traits ranged from .85 to .91. 
Howard M. Schuman. 





Newman, Joseph. “The Prediction of Shopwork Performance in an Adult Rehabili- 
tation Program. The Kent-Shakow Industrial Formboard Series.” Psychologi- 

cal Record, V (1945), 343-352. 

An investigation of the value of the K-S Formboard Series for predicting per- 
formance in shopwork was conducted by means of a study extending over two years, 
with results based on data obtained from 111 male patients in a New York sana- 
torium who took part in a rehabilitation program. Subjects were given the K-S Form- 
board before assignment to the wood-working shop; shopwork progress was determined 
by means of rating scales and subjects were also ranked in ascending order according 
to total time score on the K-S Formboard. Formboard results were studied to deter- 
mine their relationship to shopwork ratings. It is concluded that the K-S Form- 
board is of value for predicting performance in shopwork in an adult re-education 
program. A differentiating score for the Formboard is a total time of 25 minutes 
or less. A correlation coefficient (tetrachoric) of .76 was found between shop ratings 
in accuracy, speed, and constructive thinking, and total time scores. Frances Smith. 
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Peel, E. A. “On Identifying Aesthetic Types.” British Journal of Psychology, 

XXXV (1945), 61-69. 

This paper outlines a method for estimating aesthetic preference with reference 
to artistic quality rather than to temperamental traits involved. Items were arranged 
by expert judges according to aesthetic criteria and the subjects’ orders of aesthetic 
preference were compared with these criteria by means of correlation. The correla- 
tions were then analyzed into factors characterizing the group of subjects and the 
criteria. Three different matrices of correlation coefficients were obtained: correla- 
tions between orders of “liking” for the subjects, correlations between the orders of 
“liking” and the criteria, and correlations between the criteria. Frances Smith. 





Rabin, Albert I. “The Use of the Wechsler-Bellevue Scales with Normal and Abnor- 

mal Persons.” Psychological Bulletin, XLII (1945), 410-422. 

Findings of various investigators employing the Wechsler-Bellevue Scale are 
coordinated and summarized and suggestions are offered for future treatment of the 
data rapidly being accumulated. Correlations between this scale and other tests of 
intelligence and achievement are reported, and the diagnostic effectiveness of the 
scale is substantiated. The scale is demonstrated to be effective in group pattern- 
analysis because of the functional unity of its subtests; and the possibility of achiev- 
ing a diagnostic tool in individual cases through more effective control of major 
factors involved is discussed. Retest data with clinical material and miscellaneous 
studies are reported and the need for long-range retest studies and further investiga- 
tion by the method of factor analysis in different clinical groups and at different age 
levels is emphasized. Frances Smith. 





Sarason, E. and Sarason, §. “A Problem in Diagnosing Feeble-Mindedness.” Jour- 

nal of Abnormal and Social Psychology, XL (1945), 323-329. 

Criteria are formulated by means of which a clinical psychological report may 
be judged in diagnosis of mental deficiency. These criteria are: (1) inclusion in the 
psychological examination of several measures of intelligence of the individual type 
of test; (2) use of projective techniques to clarify the relation between intelligence 
and personality; (3) internal analysis of each test; (4) interpretation of test func- 
tioning as part of a continuous behavioral sequence; (5) integration of information 
obtained from the various tests. Need for care in acceptance of numerical scores in 
doubtful and near-borderline cases is illustrated by presentation of the complete 
psychological report of a particular case. Frances Smith. 





Shakow, D., Rodnick, E. H., and Lebeaux, T. “A Psychological Study of a Schizo- 
phrenic: Exemplification of a Method.” Journal of Abnormal and Social Psy- 
chology, XL (1945), 154-174. 

A collection of 8 psychological devices: (1) Stanford-Binet or Wechsler-Bellevue 
Intelligence Scale; (2) Rorschach Test; (3) Association Test; (4) Pinboard Aspira- 
tion Test; (5) Thematic Apperception Test; (6) Targetball-Thematic Test; (7) 
Pursuitmeter-Stress Test; and (8) Picture-Frustration Test were employed as one 
aspect of a comprehensive study of neuropsychiatric patients who had been in service 
in the armed forces. The specific aim of the psychological analysis was to construct 
individual profiles and also to differentiate patient groups from each other and from 
normal groups. One case was reviewed in detail. Helen Heath. 





Thurstone, L. L. “A Multiple Group Method of Factoring the Correlation Matrix.” 

Psychometrika, X (1945), 73-78. 

There are a number of methods of factoring the correlation matrix which require 
the calculation of a table of residual correlations after each factor has been extracted. 
This is perhaps the most laborious part of factoring. The method to be described 
here avoids the computation of residuals after each factor has been computed. Since 
the method turns on the selection of a set of constellations or clusters of test vectors, 
it will be called a multiple-group method of factoring. The method can be used for 
extracting one factor at a time if that is desired but it will be considered here for the 
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more interesting case in which a number of constellations are selected from the corre- 
lation matrix at the start. The result of this method of factoring is a factor matrix 
F which satisfies the fundamental relation FF’=R. (Courtesy Psychometrika.) 


Walker, K. F., Staines, R. G., and Kenna, J.C. “The Influence of Scoring Methods 
upon Score in Motor Perseveration Tests.” British Journal of Psychology, 
General Section, XXXV (1945), 51-60. 

Spearman based his theory of mental inertia on the results of motor persevera- 
tion tests. Certain features in the construction and scoring of these tests invalidate 
these results. The two types of these tests are (1) creative effort, and (2) alter- 
nation. The five methods of scoring these tests are: (a) X-Y, (b) X/Y, (c) 
X-X/X, (d) X+Y/2XY, and (e) E/A where X and Y are two interfering tasks, 
E is the expected score, and A the actual score. Methods (a), (b), and (c) corre- 
late highly with each other but low or negatively with methods (d) and (e). Spear- 
man’s general interference factor, found when method (d) is used, disappears when 
method (e) is used. When initial differences in ease of performing the two activities 
are great, the creative effort test using method (d) does not measure difficulty in 
alternation but only the difference in ease of performance. The initial difference in 
ease of performance is not related to ease of alternation of the two activities. Howard 
M. Schuman. 


Werner, Heinz (with the collaboration of Doris Carrison). “Perceptual Behavior of 
Brain-Injured, Mentally Defective Children: An Experimental Study by Means 
— the Rorschach Technique.” Genetic Psychology Monographs, XXXI (1945), 

3-110. 

Experimental analysis of perceptual and conceptual behavior of brain-injured 
and non-brain-injured subnormai children of comparable mental ages was conducted 
by means of the Rorschach technique. Significant differences in response were 
found, behavior of brain-injured children being characterized by disintegrative ten- 
dencies, forced responsiveness to sensory stimulation, lack of affective motor-control, 
lack of associational control, meticulosity and perseverations. Interpretation of these 
responses is made in the light of previous studies including experiments with simi- 
larly formed groups of children, work on the Rorschach test with brain-injured adults, 
and general studies of responses to the Rorschach test. Characteristic clusters of 
behavior traits of brain-injured children are deduced from this analysis. Frances 
Smith. 











